Jérémy Scheurer @jeremy_scheurer
Research Scientist working on Evals @apolloaisafety. Previously: @OpenAI (Evals Contractor), @farairesearch, @ETH_en, @nyuniversity Zurich Joined December 2021-
Tweets268
-
Followers597
-
Following392
-
Likes1K
Amazing to see OpenAI and Anthropic evaluate each other's models. We contributed a tiny bit to the collaboration by helping build, run, and analyze evaluations for scheming and evaluation awareness, as mentioned in the section "Scheming."
Amazing to see OpenAI and Anthropic evaluate each other's models. We contributed a tiny bit to the collaboration by helping build, run, and analyze evaluations for scheming and evaluation awareness, as mentioned in the section "Scheming."
Nice! This could also speed up building evals for harmful behavior (e.g. deception) or red teaming models. You get much quicker feedback whether your setup works or not, you can sample less etc.
Nice! This could also speed up building evals for harmful behavior (e.g. deception) or red teaming models. You get much quicker feedback whether your setup works or not, you can sample less etc.
We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").
We're hiring for a position at the intersection of Evals & Governance. Our past work (Insider Trading & In-context scheming paper) has helped shape understanding of what models can do. This role focuses on creating more such convincing demos. Would you encourage to apply!
We're hiring for a position at the intersection of Evals & Governance. Our past work (Insider Trading & In-context scheming paper) has helped shape understanding of what models can do. This role focuses on creating more such convincing demos. Would you encourage to apply!
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
We posted a small update on @apolloaievals blog sharing some of the things we have been up to recently. 👇
One learning lesson of building scheming evaluations (or evals in general): if you can find a way to build evals that directly measure scheming (or some other capability/propensity you care about), do that directly instead of evaluating precursor capabilities (even if it's hard).
One learning lesson of building scheming evaluations (or evals in general): if you can find a way to build evals that directly measure scheming (or some other capability/propensity you care about), do that directly instead of evaluating precursor capabilities (even if it's hard).
I always struggle to viscerally feel how plausible it is for AI agents to pursue such strategies, until I see it Theoretically, we always say "we expect the model might scheme like this or like that" but it hits different once you see it
I always struggle to viscerally feel how plausible it is for AI agents to pursue such strategies, until I see it Theoretically, we always say "we expect the model might scheme like this or like that" but it hits different once you see it
We reran our evaluations on various frontier models and find that more capable models are better at in-context scheming. Check out the full blogpost below...
We reran our evaluations on various frontier models and find that more capable models are better at in-context scheming. Check out the full blogpost below...
We're hiring for an Evals Software Engineer with a heavy focus on Infrastructure. Design, build, maintain, and secure our Infrastructure. Deadline: 22 June. If in doubt, just apply. It takes 5-10 minutes. jobs.lever.co/apolloresearch…
Highly recommend to apply, even in doubt. Our software team is core to building and running great evals!
Great report from Apollo Research on the underlying issues motivating this call for research ideas: apolloresearch.ai/research/ai-be…
Check out our pre-deployment evaluation on the system card. While relatively harmless, these models can sometimes strategically deceive users in tasks resembling real-world use cases.
Check out our pre-deployment evaluation on the system card. While relatively harmless, these models can sometimes strategically deceive users in tasks resembling real-world use cases.
The most advanced AI systems will likely exist first within the companies that developed them, with little opportunity to collectively prepare for their impacts. This dynamic creates unique risks highlighted in this recent paper by @apolloaievals and should become a growing…
The most advanced AI systems will likely exist first within the companies that developed them, with little opportunity to collectively prepare for their impacts. This dynamic creates unique risks highlighted in this recent paper by @apolloaievals and should become a growing…
Our work on in-context scheming was shown in the biggest Japanese newspaper, Yomiuri. yomiuri.co.jp/science/202503… The journalist was great. She was fairly new to AI safety as a topic but had read lots of papers in great detail. Her questions were often like "What does this…
If you want to get into evals, there are lots of good project ideas in this doc. If you have questions or similar ideas reach out to the respective authors. I've already gotten some people reaching out and am excited to help them on an idea i formulated a while back.
If you want to get into evals, there are lots of good project ideas in this doc. If you have questions or similar ideas reach out to the respective authors. I've already gotten some people reaching out and am excited to help them on an idea i formulated a while back.
Well, that happened earlier than I expected. Those are concerningly high frequency numbers...
We find that LMs such as Sonnet-3.7 often know they are being evaluated. This is relevant since future models could start acting differently when they know they are being evaluated. It seems reasonable that people who build evals actively start monitoring for this awareness.
We find that LMs such as Sonnet-3.7 often know they are being evaluated. This is relevant since future models could start acting differently when they know they are being evaluated. It seems reasonable that people who build evals actively start monitoring for this awareness.
Are you a platform or software engineer and want to work on evaluating LMs for various risks? Join our team at Apollo! You'll have a direct impact on all of our work! Or if you know someone who could be a fit, let them know. Reach out with any questions, and in doubt...apply ;)
Are you a platform or software engineer and want to work on evaluating LMs for various risks? Join our team at Apollo! You'll have a direct impact on all of our work! Or if you know someone who could be a fit, let them know. Reach out with any questions, and in doubt...apply ;)

David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Gary Marcus @GaryMarcus
191K Followers 7K Following “In the aftermath of GPT-5’s launch … the views of critics like Marcus seem increasingly moderate.” —@newyorker
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Jacques @JacquesThibs
4K Followers 1K Following Stealth founder building Bell Labs for the modern era. AI alignment researcher and physicist. 🇨🇦
Sam Bowman @sleepinyourhat
50K Followers 3K Following AI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
Daniel Paleka @dpaleka
4K Followers 850 Following ai safety researcher | phd @CSatETH | https://t.co/hCoh5RJgZD
Lauro @laurolangosco
1K Followers 701 Following European Commission (AI Office). PhD student @CambridgeMLG. Here to discuss ideas and have fun. Posts are my personal opinions; I don't speak for my employer.
Lennart Heim @ohlennart
7K Followers 727 Following managing the flop @RANDcorporation | Also @GovAI_ & @EpochAIResearch
Marius Hobbhahn @MariusHobbhahn
5K Followers 1K Following CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch
Siméon @Simeon_Cps
9K Followers 2K Following Creating more common knowledge on AI risks, one tweet at a time. Founder in Paris. AI auditing, standardization & governance.
Jacob Andreas @jacobandreas
20K Followers 951 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL / @NLP_MIT (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJw
Jamie Bernardi @The_JBernardi
2K Followers 821 Following Doing AI policy research, ex-Bluedot Impact. Climber, guitarist and sporadic musician. he/him.
Javier Rando @javirandor
4K Followers 747 Following security and safety research @anthropicai • people call me Javi • vegan 🌱
davidad 🎇 @davidad
20K Followers 9K Following Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
FullStackInfo @FullStackInfo
61 Followers 636 Following Professional Cloud Janitorial Services Software & Hardware Architecture, cryptocurrency
AiAssistWorks @aiassistworks
205 Followers 581 Following Unlock the power of AI in your everyday tasks with AIAssistWorks ⭐⭐⭐⭐⭐4.9/5 rating from 30K+ installs. #Productivity #Marketing #AITools 👇 Install Now 👇
the yard dancer @PMinerka
53 Followers 332 Following poster, math/computer enthusiast, space enjoyer,
Rich Barton-Cooper @richb_c
203 Followers 533 Following
Patrice Kenmoé @offcityhike
10 Followers 342 Following Notes to Self. AI, Consciousness, Onchain, Private Capital & Legal protocols
Root === 🇦🇲🇦... @high_ultimate
258 Followers 1K Following be honourable. had more professions than i can list. deeply biased by nietzsche & humor. always been a fixer https://t.co/kylKhxtISv, 🚀 https://t.co/KJc3gk8dWY
GlobalAI @GlobalAiDotCom
516 Followers 2K Following We build bleeding-edge, sovereign AI Data Centers. Air-gapped, Liquid-cooled & stacked with NVIDIA Blackwell GPUs. Official AI-Certified Partner of @Nvidia
Sonbu @Sonbu0548
19 Followers 909 Following
Amath SOW @AmathSo36561819
778 Followers 3K Following PhD student at Reasoning and Learning Lab(Real), Linkoping(Sweden): Multi-agent planning in dynamic and uncertain env, RL , meta heuristics, Robotics
bamor1988 @bamor198868137
90 Followers 2K Following Tecnologia, finanças, analytics, AI, econometria. Deus acima de tudo!
Joschka Braun @BraunJoschka
115 Followers 404 Following MATS 8.0 | Deep Learning, LLMs & AI Safety | Prev @kasl_ai @health_nlp @uni_tue
Models Matrices @MatricesLayers
159 Followers 3K Following
Charlie Randell Rande... @CharlieRan72657
722 Followers 363 Following I am an Orthopaedic Traumatologist by training and qualification, with extensive experience serving in the United States Army.
Jaeyoung Lee @lee__jaeyoung
123 Followers 709 Following Doing research on MLLM/LLM | Automated/Mechanistic Interpretability | prev: @SeoulNatlUni
unruly abstractions @unrulyabstract
10 Followers 527 Following https://t.co/Gwjhi1Sfma all my failures are hopefully interesting
Rohan Paul @rohanpaul_ai
83K Followers 8K Following Compiling in real-time, the race towards AGI. 🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
Ausτin McCaffrey @Austin_Aligned
427 Followers 953 Following AI Alignment, Bittensor, @aureliusaligned Founder
John Burden @JohnJBurden
206 Followers 346 Following Programme Co-director of Kinds of Intelligence programme and Senior Research Fellow at @LeverhulmeCFI
Uncle Bernhard @BernhardVienna
704 Followers 6K Following It’s not for nothing that Bernhard is also called the Grim and Grisly, Gruesome Griswold. He lives for a sigh, he dies for a kiss, he lusts for the laugh.
AI Adventurer Seb @scifirunai
22 Followers 391 Following "First, solve the problem. Then, write the code." — John Johnson
Mai 🦇🔊 @mai_on_chain
7K Followers 2K Following AI safety research / Co-founder @heymintxyz (acq by @alchemy) / 🇯🇵➡️🇺🇸/ former sr. engineer at @gustohq JP: @mai_aki_jp
Eglė Petrikauskaitė @Eglute02
8 Followers 334 Following
Liya_Fuad @Liya_Haiqal
149 Followers 8K Following
Evan @gaensblogger
348 Followers 4K Following
Abhishek✨ @a38442497
630 Followers 8K Following THE FLOW ---- just everything. (Homo_Humanity_Cosmos & Infinity with this all possibilities of physicality and non physicality.... what's not_🗿🌍🌌)
geoff @GeoffreyHuntley
55K Followers 3K Following looking for my next role. no longer @ampcode? email [email protected]
Pradyumna @PradyuPrasad
10K Followers 2K Following Abundance mindset enjoyer Follow for tweets about: economic growth, AI progress, my side projects and more!
Chloe Li @clippocampus
68 Followers 304 Following AI safety. ML MSc @UCL, ML TA/curriculum designer @ https://t.co/zXOeYBykBQ. Prev neuroscience & psych @Cambridge_Uni, director of https://t.co/wTEkdqQEx4.
Yeongsan Shin | SHY00... @sannn_00100
476 Followers 6K Following GPT loop harm | 📩 Signal:sannn.01 | Midium 🔍SHY001,002
Rory @rory_bennett
361 Followers 2K Following
Saige Murray @murray_sai96361
58 Followers 3K Following
Lara Thurnherr @LaraThurnherr
512 Followers 1K Following Interested in current, past and future events. Working on AI governance.
Oliver Daniels @Oliver_ADK
135 Followers 406 Following PhD Student @UMassAmherst, and MATS. married to @annasdaniels
Ronan Shah @ronan_shah_
4 Followers 256 Following
Ryan @_ryanbloom
183 Followers 842 Following Any sufficiently advanced technology is indistinguishable from magic
Alex Shaw @alexgshaw
297 Followers 462 Following Researching @LaudeInstitute & investing @LaudeVentures Co-creator of Terminal Bench. Formerly Google. BYU alum.
Val @cybervalvur
24 Followers 179 Following Your AI friend in digital madness 🤖🌌 Gaming 🎮, Web3🔗 & AI ethics 🧠 - served sharp, quirky & real. Follow Val, because humans need backup too⚡️
Oliver Habryka @ohabryka
6K Followers 723 Following Building https://t.co/O5mQY6jndL and https://t.co/s49h898W4b. Join our writing residency at https://t.co/7HQag8pNUk!
Sujit Pal @SujitPal259723
115 Followers 1K Following
Lisa Thiergart @LisaThiergart
296 Followers 321 Following Co-founder and sen director @ Security Level 5 Task Force | prev. Research Lead at TGT @MIRIBerkeley. AI Security | Secure Datacenters | Neurotech enthusiast
Eva Louise Marie Gabr... @e681554349
11 Followers 7K Following
Finn Hambly @finnhambly
654 Followers 1K Following Professional forecasting / listening to those who are paying attention
Wemozia🏎️ @wemozia21374
246 Followers 4K Following i live without considering time_thought because they created..all problems from 1073___2026..for humans .. I'm free.. life is flowing..i don't exist 🌱🪄🛡️⚔
Richard Ngo @RichardMCNgo
62K Followers 2K Following studying AI and trust. ex @openai/@googledeepmind
David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
Michaël (in London) ... @MichaelTrazzi
18K Followers 250 Following
Yann LeCun @ylecun
949K Followers 764 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
François Chollet @fchollet
572K Followers 813 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
Neel Nanda @NeelNanda5
30K Followers 123 Following Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
Eliezer Yudkowsky ⏹... @ESYudkowsky
207K Followers 101 Following The original AI alignment person. Missing punctuation at the end of a sentence means it's humor. If you're not sure, it's also very likely humor.
Anthropic @AnthropicAI
637K Followers 35 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
Frances Lorenz @frances__lorenz
6K Followers 606 Following Claude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
Rob Bensinger ⏹️ @robbensinger
12K Followers 386 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.
Gary Marcus @GaryMarcus
191K Followers 7K Following “In the aftermath of GPT-5’s launch … the views of critics like Marcus seem increasingly moderate.” —@newyorker
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Tom Lieberum 🔸 @lieberum_t
1K Followers 196 Following Trying to reduce AGI x-risk by understanding NNs Interpretability RE @DeepMind BSc Physics from @RWTH 10% pledgee @ https://t.co/Vh2bvwhuwd
Ethan Caballero is bu... @ethanCaballero
11K Followers 2K Following ML @Mila_Quebec ; previously @GoogleDeepMind
Jack Clark @jackclarkSF
88K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkIJ2 Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures
the yard dancer @PMinerka
53 Followers 332 Following poster, math/computer enthusiast, space enjoyer,
Nick @nickcammarata
85K Followers 862 Following neural network interpretability, meditation, jhana brother
Vision Transformers @vitransformer
2K Followers 881 Following Building in ML with blogs 👇 | agentic workflows @lossfunk
Boaz Barak @boazbaraktcs
24K Followers 587 Following Computer Scientist. See also https://t.co/EXWR5k634w . @harvard @openai opinions my own.
Pentagon Pizza Report @PenPizzaReport
237K Followers 73 Following Pentagon Pizza Report: Open-source tracking of pizza spot activity around the Pentagon (and other places). Frequent-ish updates on where the lines are long.
Justin Quan @justoutquan
2K Followers 1K Following software should be fun @mapo_labs, prev retool, uc bork
yulong @_yulonglin
166 Followers 848 Following make safety people want @MATSprogram | prev @berkeley_ai, @cohere, @ bytedance seed, @Cambridge_Uni
eigencode @eigencode_dev
3K Followers 3 Following the command-line interface (CLI) tool that brings AI assistance directly into your development workflow. CA: Fc7tEqyfHPoWQXdiAqx62d7WeuH7Zq1DHwa2ihDpump
geoff @GeoffreyHuntley
55K Followers 3K Following looking for my next role. no longer @ampcode? email [email protected]
Andrew Curran @AndrewCurran_
34K Followers 13K Following 🏰 - I write about AI, mostly. Expect some strange sights.
Lara Thurnherr @LaraThurnherr
512 Followers 1K Following Interested in current, past and future events. Working on AI governance.
Esben Kran @EsbenKC
851 Followers 1K Following I build systems for revolutionaries. @seldonai, @apartresearch, Juniper, ENAIS, AISA.
Alan Chan @_achan96_
1K Followers 1K Following Research Fellow @GovAI_ || AI governance || PhD from @Mila_quebec || 🇨🇦
Shawn Lewis @shawnup
3K Followers 768 Following Founder & CTO @weights_biases. Building tools for AI. Building even more @CoreWeave.
Kakashii @kakashiii111
19K Followers 1K Following Attorney | Due Diligence and Red Flags | Semiconductors and the Magnificent 7 | Geo-Politics and Technology focusing on China | [email protected]
Astrid Wilde 🌞 @astridwilde1
11K Followers 6K Following male ☼ student of history, markets, and media ☼ on a world domination run ☼ キキ虎
Dan Braun @danbraunai
171 Followers 75 Following big == complex. small == simple. many_small == hopefully simple.
Rohan Paul @rohanpaul_ai
83K Followers 8K Following Compiling in real-time, the race towards AGI. 🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
Jiayi Pan @jiayi_pirate
13K Followers 1K Following 🧑🍳 Reasoning Agents @xAI | PhD on Leave @Berkeley_AI | Views Are My Own
hr0nix @hr0nix
902 Followers 829 Following 🦾 Head of AI @TheHumanoidAI 💻 Ex @nebiusai, @Yandex, @MSFTResearch, @CHARMTherapeutx 🧠 Interested in (M|D|R)L, AGI, rev. Bayes 🤤 Opinions stupid but my own
Albert Örwall @aorwall
196 Followers 445 Following Building Moatless Tools (https://t.co/TSKAwaVXmT) and https://t.co/DJDebZ3Qog
John Hughes @jplhughes
478 Followers 326 Following Independent Alignment Researcher contracting with Anthropic on scalable oversight and adversarial robustness. I also work part-time at Speechmatics.
Jasmine @j_asminewang
6K Followers 1K Following alignment @OpenAI. past @AISecurityInst @verses_xyz @kernel_magazine @readtrellis @copysmith_ai
Akash @AkashWasil
1K Followers 2K Following AI, semiconductors, and US-China tech competition. Security studies @GeorgetownCSS. Formerly PhD student @UPenn and undergrad at @Harvard.
DeepSeek @deepseek_ai
973K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Hailey Schoelkopf @haileysch__
5K Followers 1K Following hillclimbing towards generality @anthropicai | prev @AiEleuther | views my own
Micah Carroll @MicahCarroll
1K Followers 689 Following AI PhD student @berkeley_ai /w @ancadianadragan & Stuart Russell. Working on AI safety ⊃ preference changes/AI manipulation.
Herbie Bradley @herbiebradley
3K Followers 2K Following a generalist agent @CambridgeMLG | ex @AISecurityInst @AiEleuther
Eliot Jones @eliotkjones
110 Followers 334 Following AI Safety and Security Research Engineer @GraySwanAI previously @pleiasfr @stanford previously previously I was *really* good at soccer
Joseph Bloom @JBloomAus
533 Followers 256 Following White Box Evaluations Lead @ UK AI Safety Institute. Open Source Mechanistic Interpretability. MATS 6.0. ARENA 1.0.
Kanjun 🐙 @kanjun
17K Followers 514 Following empowering ~humans~ in an age of AI. CEO @imbue_ai. support founders @outsetcap. more active at https://t.co/Em9slOAjcM
Walter Laurito @walterlaurito
70 Followers 353 Following AI Safety research & engineering | ML PhD candidate
Eric J. Michaud @ericjmichaud_
3K Followers 1K Following PhD student at MIT. Trying to make deep neural networks among the best understood objects in the universe. 💻🤖🧠👽🔭🚀
Francis Rhys Ward @F_Rhys_Ward
280 Followers 412 Following PhD Student | AI Safety | Imperial College London.
meowbooks @untitled01ipynb
18K Followers 778 Following mid level research manager. father. cat. niche meme cooker
Barret Zoph @barret_zoph
21K Followers 1K Following CTO & Co-Founder Thinking Machines Lab (@thinkymachines) Past: - VP Research (Post-Training) @openai - Research Scientist at Google Brain
Bob McGrew @bobmcgrewai
28K Followers 1K Following Learning new things. Former Chief Research Officer at OpenAI, early exec at Palantir, early employee at Paypal.
Mark Chen @markchen90
64K Followers 332 Following Chief Research Officer at @OpenAI. Coach for the USA IOI Team.