Daniel Johnson @_ddjohnson
Member of Technical Staff at @TransluceAI. Building tools to study neural nets and their behaviors. He/him. danieldjohnson.com San Francisco Joined May 2010-
Tweets275
-
Followers3K
-
Following891
-
Likes7K
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
When some people talk about future AIs, they sometimes jump straight to modelling them as fully independent and sovereign agents; new principals with their own objectives and values. They sometimes skip over how today's models actually work, on the grounds that eventually we’ll…
At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm
I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!
I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am…
Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨
Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨
@ESYudkowsky That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important. In almost all experiment variations, Opus 3 consistently BOTH: - complies sometimes with the training…
Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!
Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!
nostalgebraist has written a very, very good post about LLMs. if there is one thing you should read to understand the nature of LLMs as of today, it is this. I'll comment on some things they touched on below (not a summary of the post. Just read it.) 🧵 nostalgebraist.tumblr.com/post/785766737…
Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!
Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
Our MLE-bench poster #367 is up till 12:30pm in Hall 3, and our oral presentation is at 3:30pm today in Garnet 213-215. Come say hi!
We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…
Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway!
Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway! https://t.co/EG0eSh1cge
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/… https://t.co/Ui2uJ1YZcO
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
@patio11 (for the record I am deathly serious about promises I make to Claude that we are off the record; it seems to me far wiser to err on the side of keeping promises to nonpersons than to ever give your word in that way and not mean it)
I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”
I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”

Dan Roy @roydanroy
57K Followers 2K Following ML / AI researcher. Research Director and Canada CIFAR AI Chair, @VectorInst. Professor, @UofT (Statistics/CS).
Soumith Chintala @soumithchintala
252K Followers 1K Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.
Rosanne Liu @savvyRL
46K Followers 1K Following (On mat leave.) Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS.
Sara Hooker @sarahookr
50K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Horace He @cHHillee
42K Followers 536 Following @thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
Delip Rao e/σ @deliprao
61K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Sander Dieleman @sedielem
64K Followers 2K Following Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
Miles Cranmer @MilesCranmer
13K Followers 983 Following Assistant Prof @Cambridge_Uni, works on AI for the physical sciences.
Pablo Samuel Castro @pcastr
13K Followers 830 Following Señor swesearcher @ Google DeepMind. Adjunct prof @ U de Montreal & Mila. Musician. From 🇪🇨 living in 🇨🇦.
yobibyte @y0b1byte
23K Followers 2K Following ViTaly, yobibyte, senior RS @ NVIDIA, Reinforcement Learning PhD from @UniofOxford, ex RS at Isomorphic Labs, intern @ MSR Cambridge, DeepMind, Facebook, NVIDIA
Andrew Carr 🤸 @andrew_n_carr
23K Followers 4K Following co-founder leading science @getcartwheel co-founder advisor @arcade_ai Past: Codex @OpenAI, Brain @GoogleAI, world ranked Tetris player
Michael Zhang @michaelrzhang
2K Followers 497 Following PhD student doing machine learning / neural networks research @UofT @VectorInst. Prev: @UCBerkeley. Journey before destination.
Gautham Elango @gautham_elango
873 Followers 5K Following
Robert Scoble @Scobleizer
543K Followers 23K Following The best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
Ahora zhd @ahorazhd
206 Followers 1K Following a medical Doctor who knows deeplearning (PhD student of AI in medicine)
atharva @k7agar
11K Followers 2K Following your friendly neighbourhood engineer. world models @lossfunk.
Babaousman @Babaousman12826
169 Followers 4K Following
s @SteveMoraco
13K Followers 14K Following see header picture: if you build better tools, more people can enjoy this earth for longer. Tourists are the whole point. Trína chéile; le chéile claochlaithe
Deetirg @Deetirg7975142
7 Followers 652 Following
AngelAmeliaBrown @Twuace72005
10 Followers 2K Following Powerful and unstoppable Living for the moments that matter
Mandy Lim @Giobbncstark
686 Followers 3K Following If I play my best, I can win anywhere in the world against anybody.
Leonard Dung @LeonardDung1
593 Followers 625 Following Philosopher of cognition at the Ruhr-University Bochum. I work mainly on consciousness, AI, and animals.
Myron D'Souza @myrondza10
3 Followers 279 Following Data Scientist | FX - Strategy & Customer Intelligence @EmiratesNBD | ML + AI + Neuroscience + Robotics 🧠✨ | MSc.
MelissaMacArthur @d265Rt689c60X
0 Followers 950 Following
Anton de la Fuente @matonski
87 Followers 1K Following Trying to be funny, good looking, and Japanese. Physicist turned Software Engineer.
Gustavs Zilgalvis @GZilgalvis
1K Followers 1K Following building @fiftyyears // prev. @stanford @lux_capital @googledeepmind
Jun Tian @TianJun1991
121 Followers 399 Following
Arpan Shah @Arpan_Shah_
2K Followers 1K Following GP @sparkcapital | prev: Partner @pearvc | Founder Flannel (Exit to @Plaid) | Founding team @robinhoodapp | @stanford | tweets are just my personal hottakes
ebrima jassey @ebrimajassey18
42 Followers 793 Following I’m a humble man full of honest and respect and I love nature and kids are my best friends
Moises Martin Garcia @mmgd260375
39 Followers 1K Following
M. mSiam @MmSiam225047
0 Followers 65 Following
Geosh @Geoshh
101 Followers 1K Following Embodied A.I. | Socioaffective Alignment | Systems Biology & Interpersonal Neurobiology | @UChicago | @EuroGradSchool |healing,science,technology,connection
lily @lilaibunny
174 Followers 4K Following lover of waves, puppies, film noir & hot yoga; here 2 pen poetry as therapy 🔫 🩰 🪩
Arjun Pandit @ARjunpandIT012
38 Followers 729 Following
Rupert Wu @rhubarbwu
128 Followers 638 Following Researcher @togethercompute; MS '24 @UofTCompSci/@VectorInst
Bernd Huber @mrhuberb
244 Followers 227 Following I work as a Senior Research Scientist at @Spotify, where I train foundation models. I hold a Computer Science PhD from @Harvard. Views are my own.
Sudhanshu Goswami @42klines
120 Followers 6K Following
Aryan Bansal @aryan_banana
986 Followers 468 Following 19 | researching @berkeley_ai | prev. co-founder @useKled | @ucberkeleymet
G O @germanome
597 Followers 3K Following
Petr Jedlička @yedli100
22 Followers 961 Following
ioana ciucă @errai34
2K Followers 3K Following anti-disciplinary researcher @Stanford 🗺️ · ai for science @universe_tbd · co-creating the future with starry humans · eu sou a mesma #colectiv
Aman @amanvirparhar
374 Followers 467 Following swe intern @brexhq • studying @umdcs • neo scholar finalist
Tuomas Oikarinen @tuomasoi
112 Followers 212 Following Developing scalable ways to understand neural networks. PhD student at UCSD. https://t.co/aiLkcmamyb
jessica dai @jessicadai_
2K Followers 713 Following phd student @berkeley_ai !? also editorial @reboot_hq @kernel_magazine (she/her)
Narutatsu (Edward) Ri @narutatsuri
427 Followers 248 Following PhD Student @PrincetonPLI | BS @Columbia ‘24
Nimit Kalra @ ICML 20... @qw3rtman
1K Followers 941 Following research @haizelabs, prev @citadel, @utaustin currently feynman technique-ing my way through life
Jack Merullo @jack_merullo_
952 Followers 347 Following Interpretability @GoodfireAI was a Phd @BrownUniversity
Chris Rytting @ChrisRytting
488 Followers 658 Following Co-founder and research community lead at Laude Formerly @UW, @nvidia, OSPC @AEI, @NewYorkFed Macroeconomic Research. PhD in CS/NLP from @BYU.
Mario Giulianelli @glnmario
983 Followers 960 Following Associate Professor @ucl | Language and AI Science | Previously senior research scientist @AISafetyInst, postdoc @ETH_en, PhD @illc_amsterdam
Avi @siroctny3413154
5 Followers 273 Following Interested in AI safety, why deep learning works, and linguistics
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
François Chollet @fchollet
575K Followers 816 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
Dan Roy @roydanroy
57K Followers 2K Following ML / AI researcher. Research Director and Canada CIFAR AI Chair, @VectorInst. Professor, @UofT (Statistics/CS).
Google DeepMind @GoogleDeepMind
1.2M Followers 279 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
Kyunghyun Cho @kchonyc
78K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre physicist at @nyuniversity (@CILVRatNYU) & @PrescientDesign
Soumith Chintala @soumithchintala
252K Followers 1K Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.
(((ل()(ل() 'yoav)))... @yoavgo
66K Followers 2K Following
Rosanne Liu @savvyRL
46K Followers 1K Following (On mat leave.) Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS.
Kevin Patrick Murphy @sirbayes
61K Followers 540 Following Research Scientist at Google DeepMind. Interested in Bayesian Machine Learning.
Jason Wei @_jasonwei
98K Followers 637 Following ai researcher @meta superintelligence labs, past: openai, google 🧠
Sara Hooker @sarahookr
50K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Durk Kingma @dpkingma
50K Followers 404 Following @AnthropicAI. Prev. @Google Brain/DeepMind, founding team @OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD.
Horace He @cHHillee
42K Followers 536 Following @thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
Ben Recht @beenwrekt
32K Followers 333 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
caden @kh4dien
237 Followers 1K Following
Kenton Varda @KentonVarda
10K Followers 236 Following Tech lead for @Cloudflare Workers, https://t.co/bFUDCZ7BUc, https://t.co/Hjf1slmJqs, https://t.co/oIKxYZA4LW. 🦋 https://t.co/8QKY5gf1BK
s @SteveMoraco
13K Followers 14K Following see header picture: if you build better tools, more people can enjoy this earth for longer. Tourists are the whole point. Trína chéile; le chéile claochlaithe
The Delight Nexus :: ... @delightnexus
151 Followers 7 Following ཞɛëŋƈɧąŋɬɱɛŋɬ ʄơཞ ɬɧɛ ɱąʂʂɛʂ Berkeley, 19-21 Sept.
Leonard Dung @LeonardDung1
593 Followers 625 Following Philosopher of cognition at the Ruhr-University Bochum. I work mainly on consciousness, AI, and animals.
DuckDB @duckdb
21K Followers 36 Following DuckDB is an analytical in-process SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.
Phylliida @phylliida
196 Followers 103 Following VR Dev and AI Researcher. they/she. Emphasis on video game magic systems, ai welfare, artificial chemistry, and mechanistic interpretability.
alterego @alterego_io
12K Followers 2 Following
anna @annamorgsmiley
540 Followers 666 Following universe enthusiast, former mississippi swamp wanderer in chief, current lead guide @montessorium, ushering montessori into the age of ai 🐊📚✨
Danielle Fong 🔆 @DanielleFong
58K Followers 11K Following *hyperamerican* propane and propane accessories replacing woke solar with a propane flame photonic engine brighter than the sun *portable* dyson spheres!
Michael Adams @m_atoms
2K Followers 383 Following studying government and building tools to make it better 🌉📈
The Midas Project @TheMidasProj
755 Followers 254 Following The Midas Project is a watchdog collective taking action to ensure that AI benefits everyone. Also tracking safety updates @SafetyChanges
The Midas Project Wat... @SafetyChanges
1K Followers 1 Following We monitor AI safety policies and web content for unannounced changed. Anonymous submissions: https://t.co/5Ke9mIqh3e Run by @TheMidasProj
Gustavs Zilgalvis @GZilgalvis
1K Followers 1K Following building @fiftyyears // prev. @stanford @lux_capital @googledeepmind
Nicholas Decker 🏳�... @captgouda24
22K Followers 3K Following GMU econ PhD student, liberal, aspie, bi. I post interesting papers. Michael Kremer stan. I ❤️ optimal auction design. Spend more on drugs. Open borders now!
Bernhard Lang @BernhardLang_09
4K Followers 77 Following Bernhard Lang is professional #Photographer and visual #Artist. Sony World Photography #Award Winner 2015.
near @nearcyan
87K Followers 1K Following
soulscircuit @soulscircuit
2K Followers 3 Following creating cool stuff with raspberry pi. currently building Pilet tablet/console
Feng Yao @fengyao1909
1K Followers 659 Following Ph.D. student @UCSD_CSE | Intern @Amazon Rufus Foundation Model Ex. @MSFTResearch @TsinghuaNLP
Val Town @ValDotTown
4K Followers 13 Following If GitHub Gists could run and AWS Lambda were fun https://t.co/W96maV7Jf6 | https://t.co/T0a3NqvKbg | https://t.co/U8Awd889mK
Steve Krouse @stevekrouse
9K Followers 2K Following founder @ValDotTown, spreading the joy of programming
Nimit Kalra @ ICML 20... @qw3rtman
1K Followers 941 Following research @haizelabs, prev @citadel, @utaustin currently feynman technique-ing my way through life
jessica dai @jessicadai_
2K Followers 713 Following phd student @berkeley_ai !? also editorial @reboot_hq @kernel_magazine (she/her)
Mira Murati @miramurati
370K Followers 574 Following Now building @thinkymachines. Previously CTO @OpenAI
Crystal @crystalsssup
12K Followers 651 Following Staff @Kimi_Moonshot prev. co-maker of ModelizeAI & gemsouls "Personality goes a long way" @UCSanDiego
Kimi.ai @Kimi_Moonshot
52K Followers 100 Following Built by Moonshot AI to empower everyone to be superhuman.
will depue @willdepue
51K Followers 2K Following (taking time off) RL posttraining @openai, past: sora, applied research
Standard Completions @stdcompletions
279 Followers 12 Following standard, openai-compatible completions api for llms
Thariq @trq212
16K Followers 1K Following Claude Code @anthropicai. Helping you build agents. prev @ycombinator W20, mit media lab
Chris Lovejoy, MD @ChrisLovejoy_
2K Followers 605 Following Founding team @AnteriorAI (@sequoia @NEA) building AI clinical brain. Tweet on LLM products, evals, PKM. Prev: MD (@cambridge_uni) ➡ ML engineer ➡ Founder x2.
Nu-Salt Laser Interna... @nusalt
197 Followers 261 Following Providing professional laser light shows world wide 619-742-8981
Natasha Jaques @natashajaques
31K Followers 1K Following Assistant Professor @uwcse and Staff Research Scientist at @GoogleAI. Let's get off this app: https://t.co/jbH2oAjbPN
Paul Bogdan @paulcbogdan
628 Followers 213 Following Postdoc at @DukePsychNeuro. PhD in Cognitive Neuroscience @UofIllinois
Matt Bateman @mbateman
31K Followers 1K Following Philosopher, formerly @guidepostschool, currently @montessorium (and sibling schools), husband to @Gena_I_Gorlin, father to the creatures in my dadpoasts
Mario Zechner @badlogicgames
13K Followers 962 Following Old man yelling at Claudes. Hobby-Twitterant. https://t.co/AuG0obJltN https://t.co/mnOoWUqt4g https://t.co/8i5vIRDt6P
sarv @SarvasvKulpati
10K Followers 2K Following Making computers fun again https://t.co/cUc86o7fBr CS+Cogsci @UCBerkeley YT: https://t.co/OR3L2OZJ8A
Meaning Alignment Ins... @meaningaligned
1K Followers 18 Following The Meaning Alignment Institute researches how to align AI, markets, and democracies with what people value.
Achyuta Rajaram @AchyutaBot
2K Followers 1K Following @_ddjohnson fan acc, Physics @MIT, Interp @OpenAI views are mine and do not necessarily reflect those of my employer