Working on new methods for understanding machine learning systems and entangled quantum systems.sites.google.com/view/jordanten… BrisbaneJoined December 2009
I'm learning the true Hanlon's razor is: never attribute to malice or incompetence that which is best explained by someone being a bit overstretched but intending to get around to it as soon as they possibly can.
It was great to be part of this statement. I wholeheartedly agree.
It is a wild lucky coincidence that models often express dangerous intentions aloud, and it would be foolish to waste this opportunity. It is crucial to keep chain of thought monitorable as long as possible
It was great to be part of this statement. I wholeheartedly agree.
It is a wild lucky coincidence that models often express dangerous intentions aloud, and it would be foolish to waste this opportunity. It is crucial to keep chain of thought monitorable as long as possible
A simple AGI safety technique: AI’s thoughts are in plain English, just read them
We know it works, with OK (not perfect) transparency!
The risk is fragility: RL training, new architectures, etc threaten transparency
Experts from many orgs agree we should try to preserve it:…
Can we leverage an understanding of what’s happening inside AI models to stop them from causing harm?
At AISI, our dedicated White Box Control Team has been working on just this🧵
🧵 1/13 My new team at UK AISI - the White Box Control Team - has released progress updates!
We've been investigating whether AI systems could deliberately underperform on evaluations without us noticing.
Key findings below 👇
🧵 1/13 My new team at UK AISI - the White Box Control Team - has released progress updates!
We've been investigating whether AI systems could deliberately underperform on evaluations without us noticing.
Key findings below 👇
New paper:
We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness 🧵
Observation 5: Technical alignment of AGI is the ballgame. With it, AI agents will pursue our goals and look out for our interests even as more and more of the economy begins to operate outside direct human oversight.
Without it, it is plausible that we fail to notice as the…
539K Followers 18K FollowingThe best from AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, and startups.
122 Followers 2K FollowingDeeply interested in maximising the probability of the best possible future for all. Applied AI / Mech Interp / Pre-Training / Robotics
522 Followers 965 Following#CRGSolutionsIND is #Services Provider. We are a #business consulting firm. #Certified Partner & Implementer of #Tableau #Alteryx #Engage #Datawatch #Atlassian
625 Followers 1K FollowingI write about AI, global risk, abundance, and progress. Bulletin of Atomic Scientists AI Fellow. Emergent Ventures grantee. Ex-diplomat. Views are my own.
603 Followers 1K FollowingAI, Econ, math, and a bit of art history as a treat. Formerly @Walmart's Economics Team; @BrookingsInst. Used to run Middlebury Effective Altruism
10K Followers 1K Followingkantian 🏳️🌈 / philosophy phd student / vegan for the animals / liberal / creatine evangelist / pause ai / 4 mutuals @formofthegood / my substack is rly good
21K Followers 3K FollowingGMU econ PhD student, liberal, aspie, bi. I post interesting papers. Michael Kremer stan. I ❤️ optimal auction design. Spend more on drugs. Open borders now!
6K Followers 595 FollowingIndie developer, now working on Simulario - a physics simulation based game, that runs all logic on GPU
Also senior dev in AAA
Unity3D, Unreal, C#, C++, HLSL
5K Followers 97 FollowingIndie game developer, procedural generation enthusiast, Dane in Finland.
I barely use this account.
Find me at 🦋@runevision.bsky.social and on 🐘mastodon.
18K Followers 1K FollowingHanging out with Claude, improving its behavior, and building tools to support that @AnthropicAI 😁
prev: @open_phil @googlebrain @openai (@microcovid)
43K Followers 3K FollowingWe're in a race. It's not USA vs China but humans and AGIs vs ape power centralization.
@deepseek_ai stan #1, 2023–Deep Time
«C’est la guerre.» ®1
9K Followers 20 FollowingAdvancing humanity's understanding of AI through interpretability research. Building the future of safe and powerful AI systems.
58K Followers 11K Following*hyperamerican* propane and propane accessories
replacing woke solar with propane flame photonic engine brighter than the sun
*portable* dyson spheres!
29K Followers 2K FollowingChief economist @joinFAI. Nonresident fellow @NiskanenCenter. Pluralist. 'The world is second best, at best.' | [email protected]
6K Followers 5K FollowingAI is good & bad, actually.
Tweeting about AI/ML methods, software dev, research, tech and society, social impact.
20yrs in tech, 10 in ML/AI, PhD in comp sci