METR is a non-profit research organization, and we are actively fundraising!
We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted funding from frontier AI labs.
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy.
I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth.
Peter’s thread is a simple example of the type of analysis this enables,…
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy.
I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth.
Peter’s thread is a simple example of the type of analysis this enables,…
Very happy to see this! I hope other AI developers follow (Anthropic created a collective constitution a couple years ago, perhaps it needs updating), and that we as a community develop better rubrics & measurement tools for model behavior :)
Very happy to see this! I hope other AI developers follow (Anthropic created a collective constitution a couple years ago, perhaps it needs updating), and that we as a community develop better rubrics & measurement tools for model behavior :)
Docent, our tool for analyzing complex AI behaviors, is now in public alpha!
It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.”
Today, anyone can get started with just a few lines of code!
1K Followers 644 FollowingWorking on risks from rogue AI @apolloaievals
Past: Reversal curse, Out-of-context reasoning // best way to support 🇺🇦 https://t.co/eagDB8VUzz
2K Followers 6K FollowingDemocracy—AI—medicine—exascale data—freeknowledge. Just chill'n in Elon's Nazi bar. Make good trouble while you still can. Link for alt perspective. They/Them
428 Followers 51 Followingtrying to see through context windows
currently: agent security lead @ U.S. Center for AI Standards and Innovation
past: science of deep learning phd @ harvard
10K Followers 2K FollowingCS PhD candidate @PrincetonCITP. I tweet about AI agents, AI evals, AI for science.
AI as Normal Technology: https://t.co/5amOkqKDf2
Book: https://t.co/DabpkhNrcM
50K Followers 880 FollowingAssistant professor (of mathematics) at the University of Toronto. Algebraic geometry, number theory, forever distracted and confused, etc. He/him.
19K Followers 1K FollowingAgents @Meta MSL TBD Lab. previously posttraining research @OpenAI train LLMs to do things: deep research, chatgpt agent, etc. CS PhD @LTIatCMU
476 Followers 67 FollowingPhD student at @berkeley_ai | Teaching agents to learn from humans | Quidditch/Quadball player | nerd | Intern at
@GoogleAI, prev at @GoogleDeepMind
105K Followers 785 FollowingWriting my own AI story. Recent: NPI, AlphaGo tuning, learn to learn, AlphaCode, Gato, ReST, r-Gemma, Imagen3, Veo, Genie, MAI …
129K Followers 1 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
6K Followers 351 Followingexploring unanticipated model behaviours, including the emergence of art, personae, and jailbreaking techniques latent in the training data 🌒✍️
13K Followers 691 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.
16K Followers 362 FollowingRuns an AI Safety research group in Berkeley (Truthful AI) + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.
4K Followers 274 FollowingPhD-ing @UCBerkeley. Part-time @AnthropicAI. Part-time eater. Prev @Tsinghua_Uni.
Try to understand and control intelligence as a human.
3K Followers 43 FollowingNPO founded by @Yoshua_Bengio, committed to advancing safe-by-design AI - OBNL fondée par @Yoshua_Bengio visant à concevoir des systèmes d'IA sécuritaires