Erik Jenner @jenner_erik
Research scientist @ Google DeepMind working on AGI safety & alignment ejenner.com Joined April 2016-
Tweets183
-
Followers918
-
Following152
-
Likes549
MATS is a great opportunity to start your career in AI safety! For MATS 9.0 I'll be running a research stream together with @emmons_scott @jenner_erik and @zimmerrol If you want to do research on AI oversight and control, apply now!
MATS is a great opportunity to start your career in AI safety! For MATS 9.0 I'll be running a research stream together with @emmons_scott @jenner_erik and @zimmerrol If you want to do research on AI oversight and control, apply now!
🚀 Applications now open: Constellation's Astra Fellowship 🚀 We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!
Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.
The fact that current LLMs have reasonably legible chain of thought is really useful for safety (as well as for other reasons)! It would be great to keep it this way
The fact that current LLMs have reasonably legible chain of thought is really useful for safety (as well as for other reasons)! It would be great to keep it this way
We stress tested Chain-of-Thought monitors to see how promising a defense they are against risks like scheming! I think the results are promising for CoT monitoring, and I'm very excited about this direction. But we should keep stress-testing defenses as models get more capable
We stress tested Chain-of-Thought monitors to see how promising a defense they are against risks like scheming! I think the results are promising for CoT monitoring, and I'm very excited about this direction. But we should keep stress-testing defenses as models get more capable
As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420
Can frontier models hide secret information and reasoning in their outputs? We find early signs of steganographic capabilities in current frontier models, including Claude, GPT, and Gemini. 🧵
My @MATSprogram scholar Rohan just finished a cool paper on attacking latent-space probes with RL! Going in, I was unsure whether RL could explore into probe bypassing policies, or change the activations enough. Turns out it can, but not always. Go check out the thread & paper!
My @MATSprogram scholar Rohan just finished a cool paper on attacking latent-space probes with RL! Going in, I was unsure whether RL could explore into probe bypassing policies, or change the activations enough. Turns out it can, but not always. Go check out the thread & paper!
New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.
New episode with @davlindner, covering his work on MONA! Check it out - video link in reply. https://t.co/CStTMbO83z
We’ve just released the biggest and most intricate study of AI control to date, in a command line agent setting. IMO the techniques studied are the best available option for preventing misaligned early AGIs from causing sudden disasters, e.g. hacking servers they’re working on.
IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵
IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵
UK AISI is hiring, consider applying if you're interested in adversarial ML/red-teaming. Seems like a great team, and I think it's one of the best places in the world for doing adversarial ML work that's highly impactful
UK AISI is hiring, consider applying if you're interested in adversarial ML/red-teaming. Seems like a great team, and I think it's one of the best places in the world for doing adversarial ML work that's highly impactful
Consider applying for MATS if you're interested to work on an AI alignment research project this summer! I'm a mentor as are many of my colleagues at DeepMind
Consider applying for MATS if you're interested to work on an AI alignment research project this summer! I'm a mentor as are many of my colleagues at DeepMind
🚨 New @iclr_conf blog post: Pitfalls of Evidence-Based AI Policy Everyone agrees: evidence is key for policymaking. But that doesn't mean we should postpone AI regulation. Instead of "Evidence-Based AI Policy," we need "Evidence-Seeking AI Policy." arxiv.org/abs/2502.09618…
🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.
🚀 Applications for the Global AI Safety Fellowship 2025 are closing on 31 December 2025! We're looking for exceptional STEM talent from around the world who can advance the safe and beneficial development of AI. Fellows will get to work full-time with leading organisations in…

David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Eugene Vinitsky (@RLC... @EugeneVinitsky
20K Followers 2K Following This is the site where I talk about the attacks on science and immigration. Science is on the other site. Lab website: https://t.co/vrtbcqRyRn
Taco Cohen @TacoCohen
27K Followers 3K Following Post-trainologer at FAIR. Into codegen, RL, equivariance, generative models. Spent time at Qualcomm, Scyfer (acquired), UvA, Deepmind, OpenAI.
davidad 🎇 @davidad
20K Followers 9K Following Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
Lauro @laurolangosco
1K Followers 700 Following European Commission (AI Office). PhD student @CambridgeMLG. Here to discuss ideas and have fun. Posts are my personal opinions; I don't speak for my employer.
Dylan HadfieldMenell @dhadfieldmenell
4K Followers 2K Following Associate Prof @MITEECS working on value (mis)alignment in AI systems; @[email protected]; he/him
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Nora Belrose @norabelrose
11K Followers 119 Following AI, philosophy, spirituality. Blending Deleuze and Dōgen. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.
Rianne van den Berg @vdbergrianne
8K Followers 2K Following Principal research manager at Microsoft Research Amsterdam. Formerly at Google Brain and University of Amsterdam. PhD in condensed matter physics.
Tom Lieberum 🔸 @lieberum_t
1K Followers 196 Following Trying to reduce AGI x-risk by understanding NNs Interpretability RE @DeepMind BSc Physics from @RWTH 10% pledgee @ https://t.co/Vh2bvwhuwd
Stanislav Fort @stanislavfort
14K Followers 7K Following Building in AI + security | Stanford PhD in AI & Cambridge physics | ex-Anthropic and DeepMind | progress + growth | 🇺🇸🇨🇿
Sabouhi @rjsabouhi
20 Followers 768 Following The architecture hasn’t changed. The foundation; the symbolic manifold it runs on? That’s another thing entirely… ψ(t) ⊗ ψ(t+τ) ⇒ Φ_coherence
Vijay Bolina @vijaybolina
4K Followers 6K Following Hacker. Engineer. Leader. Dad. Former @GoogleDeepMind, @Mandiant, @BoozAllen, USG. Tweets my own.
Jenny Qu @GuanniQu
149 Followers 391 Following just learning to be hardcore @Caltech building AI to solve hard math problems she/they
Gabriel Quevedo 💯 @gabrielq111
11 Followers 103 Following
Simple Magic @Simplemagic_ai
4 Followers 139 Following
Quant Hustle @QuantHustle
0 Followers 4K Following
Sandeep More @sandyonmars
93 Followers 416 Following Decipher the complex. Connect the dots. ॥ॐ 🌏 🦋🪐🌌 ॥
Sandra Binoy @SandraBinoy2
0 Followers 45 Following
Andrew Bempah @KrumDonDada
25 Followers 3K Following Proud Kotobabian by way of Chicago & Stanford University
Rauno Arike @RaunoArike
30 Followers 286 Following Working on AI safety at Aether Research Previously @MATSprogram
Teetaj Pavaritpong @TeetajP
35 Followers 1K Following Software Engineer | AI/ML | UIUC ‘24 B.S. in Statistics & Computer Science @SiebelSchool
Anna Soligo @anna_soligo
166 Followers 128 Following PhD Student @imperialcollege // MATS Scholar with Neel Nanda // Sometimes found on big hills ⛰️
M Emre Özkan @memreozkan_
7 Followers 148 Following
Good1 @MLMusings
37 Followers 869 Following Angel Investor | 2x start up founder | Silicon Valley | Drinking the coolaid!
Evan Russek @evanrussek
943 Followers 2K Following Postdoc in cognitive science at Princeton. Interested in computational models of thinking and decision making.
Harshal Nandigramwar @hnanacc
413 Followers 849 Following ai research @intel, prev @cariad_tech, @Uni_Stuttgart • n&w @todacklabs (https://t.co/LPf3PWfbnK, https://t.co/WESRemr9Ev, corpus, marrow)
Hugo @Hugo007600
31 Followers 307 Following
Damon Falck @DamonFalck
18 Followers 106 Following AI safety & alignment @MATSprogram. Prev ML @UniofOxford, quant @DRWTrading.
Anaïs Berkes @anais_berkes
1 Followers 123 Following
Ali Larian @ali_larian
11 Followers 232 Following PhD student @ University of Utah @KahlertSoC | Robotics, Reinforcement Learning
Kyle @kylesuffolk
794 Followers 4K Following
Ed Henderson @edlucidstates
41 Followers 340 Following | Founder @ Lucid States - exploring what it would take to build creative artificial genius. | ex-@itsjustvow | Aussie
Cezar @realcezarc
511 Followers 6K Following Software engineer, failed startup founder. Used to work @google, @onepeloton. 
Sam @belkalevin84
108 Followers 2K Following
Shashank Kirtania @5hv5hvnk
464 Followers 2K Following pre doc research fellow @prosemsft | interested in AI & Formal Methods
Raymond Ng @Raymondng_aisg
8 Followers 1K Following
Evan Ellis @evandavidellis
141 Followers 766 Following Alignment and coding agents. AI @ BAIR | Scale | Imbue
monedula @velarus3
76 Followers 602 Following cooking AI stuff 👨🍳 | MD CS 💻 | innovation enthusiast | start-ups lover | eu/acc 🇪🇺
Yasaman Ansari @Yasaman_Ansari
6 Followers 191 Following
Sahar Abdelnabi 🕊 @sahar_abdelnabi
2K Followers 785 Following Researcher @Microsoft | Next: Faculty @ELLISInst_Tue & @MPI_IS | ex. CISPA | Neurodiv. 🦋 | AI safety & security | life & peace for all ☮️🍉 Opinions my own.
Sachit Malik @isachitmalik
167 Followers 4K Following Hola | Security Engineering at Apple | Alum: Carnegie Mellon; IIT Delhi
thomas @thomazvu
449 Followers 2K Following guy who makes MMOs in the browser 🌎 https://t.co/hxuPqjv8IE 🤖 https://t.co/paQs8tsyHe
Wenx @imwendering
51 Followers 743 Following Independent AI Safety Researcher. Formerly @Meta Integrity Seasoned engineer and budding researcher. Occasionally appears in galleries with my paintings
Sharmake Farah (sharm... @SharmakeFarah14
270 Followers 604 Following Thinks AI risk is somewhat likely, and AI benefits huge if we can align AIs to someone that is willing to promote human thriving even when humans are useless.
𝐼𝓃𝒹𝑜𝓂�... @de_indomitus
1 Followers 400 Following Neural Nets Whisperer | Weaver of Vision, Language, Sound, & Thought | Hunter of Deep Emergent Truth | Binding Order from Chaos | Designing Steerable Minds
Tomas Tulka 💙💛 @tomas_tulka
135 Followers 877 Following Programmer and technical author. Also, I am interested in math, nature, and fine art.
Sarang S @saaarangs
24 Followers 178 Following
Samyak @sams_jain
267 Followers 836 Following PhD @Berkeley_EECS | Previous @MSFTResearch,@_FiveAI, @kasl_ai, @val_iisc, CS @IITBHU_Varanasi. Interested in foundations of AI and AI Safety.
Richard Ngo @RichardMCNgo
62K Followers 2K Following studying AI and trust. ex @openai/@googledeepmind
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
Eliezer Yudkowsky ⏹... @ESYudkowsky
207K Followers 101 Following The original AI alignment person. Missing punctuation at the end of a sentence means it's humor. If you're not sure, it's also very likely humor.
David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Neel Nanda @NeelNanda5
30K Followers 123 Following Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
Amanda Askell @AmandaAskell
54K Followers 655 Following Philosopher & ethicist trying to make AI be good @AnthropicAI. Personal account. All opinions come from my training data.
Google DeepMind @GoogleDeepMind
1.2M Followers 279 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
davidad 🎇 @davidad
20K Followers 9K Following Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
Grant Sanderson @3blue1brown
411K Followers 362 Following Pi creature caretaker. Contact/faq: https://t.co/brZwdQfdif
Lauro @laurolangosco
1K Followers 700 Following European Commission (AI Office). PhD student @CambridgeMLG. Here to discuss ideas and have fun. Posts are my personal opinions; I don't speak for my employer.
Kelsey Piper @KelseyTuoc
47K Followers 942 Following We're not doomed, we just have a big to-do list.
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Nora Belrose @norabelrose
11K Followers 119 Following AI, philosophy, spirituality. Blending Deleuze and Dōgen. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.
Rob Wiblin @robertwiblin
44K Followers 754 Following Host of the 80,000 Hours Podcast. Exploring the inviolate sphere of ideas one interview at a time: https://t.co/2YMw00bkIQ
Tom Lieberum 🔸 @lieberum_t
1K Followers 196 Following Trying to reduce AGI x-risk by understanding NNs Interpretability RE @DeepMind BSc Physics from @RWTH 10% pledgee @ https://t.co/Vh2bvwhuwd
Connor Leahy @NPCollapse
26K Followers 573 Following CEO @ConjectureAI - Ex-Head of @AiEleuther - Leave me anonymous feedback: https://t.co/OJWQWKNrHk - I don't know how to save the world, but dammit I'm gonna try
Ziyue Wang @ZyWang25
48 Followers 219 Following Research Engineer @ Google DeepMind working on AGI Safety.
Victoria Krakovna @vkrakovna
10K Followers 503 Following Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute @flixrisk. Views are my own and do not represent GDM or FLI.
Rohan Gupta @RohDGupta
13 Followers 93 Following
Joshua Clymer @joshua_clymer
2K Followers 114 Following Turtle hatchling trying to make it to the ocean. I work at Redwood Research.
Xander Davies @alxndrdavies
2K Followers 715 Following safeguards lead @AISecurityInst | PhD student w @yaringal at @OATML_Oxford | prev @Harvard (https://t.co/695XYMKqjI)
Jasmine @j_asminewang
6K Followers 1K Following alignment @OpenAI. past @AISecurityInst @verses_xyz @kernel_magazine @readtrellis @copysmith_ai
Lighthaven PR Departm... @LighthavenPR
867 Followers 17 Following The official twitter of the Lighthaven PR Department.
bilal @bilalchughtai_
808 Followers 662 Following interpretability & ai safety @ google deepmind | cambridge mmath
Olivia Jimenez @ ICML @oliviagjimenez
146 Followers 319 Following Talent & special projects lead at the UK Gov's AI Security Institute
DeepSeek @deepseek_ai
973K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Nathan Labenz @labenz
16K Followers 3K Following AI Scout, building text-2-video @Waymark, host of The Cognitive Revolution podcast
Ryan Greenblatt @RyanPGreenblatt
6K Followers 4 Following Chief scientist at Redwood Research (@redwood_ai), focused on technical AI safety research to reduce risks from rogue AIs
Davis Brown @davisbrownr
449 Followers 977 Following Research in science of {deep learning, AI security, safety}. PhD student at UPenn & RS at @PNNLab
Matej Jusup @MatejJusup
201 Followers 180 Following A PhD in multi-agent RL at ETH Zurich and a chess enthusiast (2585 Elo @Chesscom) who developed an LM @GoogleDeepMind capable of playing the game (3200 Elo).
Alex Serrano @sertealex
26 Followers 234 Following AI research | Prev. Research Intern @CHAI_Berkeley @Google
Luke Bailey @LukeBailey181
365 Followers 274 Following CS PhD student @Stanford. Former CS and Math undergraduate @Harvard.
SoLaR @ NeurIPS2024 @solarneurips
270 Followers 0 Following NeurIPS2024 workshop for Socially Responsible Language Modelling Research
Impact Academy @aisafetyfellows
201 Followers 31 Following We're a startup that runs cutting-edge fellowships to enable global talent to use their careers to contribute to the safe and beneficial development of AI.
Patrick McKenzie @patio11
184K Followers 802 Following I work for the Internet and am an advisor to @stripe. These are my personal opinions unless otherwise noted.
Jesse Smith @JesseTayRiver
1K Followers 602 Following BJJ brown belt and effective altruist torn between x-risk & global dev. Building diagnostics, collapse resilience & post-apocalyptic skills.
Daniel Filan 🔎 @freed_dfilan
789 Followers 587 Following This is my personal / non-professional account. My professional account is @dfrsrchtwts.
Micah Carroll @MicahCarroll
1K Followers 689 Following AI PhD student @berkeley_ai /w @ancadianadragan & Stuart Russell. Working on AI safety ⊃ preference changes/AI manipulation.
The Midas Project Wat... @SafetyChanges
1K Followers 1 Following We monitor AI safety policies and web content for unannounced changed. Anonymous submissions: https://t.co/5Ke9mIqh3e Run by @TheMidasProj
OpenAI Newsroom @OpenAINewsroom
109K Followers 3 Following The official newsroom for @OpenAI. Tweets are on the record. If you like this account, you’ll love our blog: https://t.co/nEYf8Iq3C0
Adrià Garriga-Alonso @AdriGarriga
1K Followers 595 Following Research Scientist at FAR AI (@farairesearch), making friendly AI.
carl feynman @carl_feynman
4K Followers 256 Following I’ve spent a lifetime switching my Special Interest every year or two. By now I’m surprisingly knowledgeable in a lot of fields— a skill now obsoleted by AI.
Cem Anil @cem__anil
3K Followers 2K Following Machine learning / AI Safety at @AnthropicAI and University of Toronto / Vector Institute. Prev. @google (Blueshift Team) and @nvidia.
Alex Mallen @alextmallen
384 Followers 275 Following Redwood Research (@redwood_ai) Prev. @AiEleuther
Andrew Lampinen @AndrewLampinen
9K Followers 2K Following Interested in cognition and artificial intelligence. Research Scientist @DeepMind. Previously cognitive science @StanfordPsych. Tweets are mine.
Jordan Taylor @JordanTensor
368 Followers 1K Following Working on new methods for understanding machine learning systems and entangled quantum systems.
Emmett Shear @eshear
118K Followers 957 Following CEO of Softmax: Applied developmental cybernetics research
Kayo Yin @kayo_yin
15K Followers 696 Following PhD student @berkeley_ai @berkeleynlp. AI alignment & signed languages. Prev @carnegiemellon @polytechnique, intern @msftresearch @deepmind. 🇫🇷🇯🇵
Trenton Bricken @TrentonBricken
12K Followers 2K Following Trying to figure out what makes minds and machines go "Beep Bop!" @AnthropicAI
Grace (cross posting ... @kindgracekind
5K Followers 2K Following ideonomist, ai navel gazer, skyborg https://t.co/UGyhDIKCaj
Fred Zhang @FredZhang0
1K Followers 501 Following research scientist @googledeepmind, prev phd @berkeley_eecs, DM open
The Cultural Tutor @culturaltutor
1.7M Followers 69 Following I've written a book, and you can get it here:
Jascha Sohl-Dickstein @jaschasd
24K Followers 706 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
John Schulman @johnschulman2
65K Followers 1K Following Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music
Yong Zheng-Xin (Yong) @yong_zhengxin
2K Followers 2K Following safety and reasoning @BrownCSDept || ex-intern/collab @AIatMeta @Cohere_Labs || sometimes write on https://t.co/cXhbz6Fx3t