Chris Olah @ch402
Reverse engineering neural networks at @AnthropicAI. DMs open! Previously @distillpub, OpenAI Clarity Team, Google Brain. Personal account. colah.github.io San Francisco, CA Joined June 2010-
Tweets5K
-
Followers90K
-
Following173
-
Likes10K
New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
Announcing a progress update from the @GoogleDeepMind mech interp team! Inspired by @AnthropicAI's excellent monthly updates, we share a range of updates on our work on Sparse Autoencoders, from signs of life on interpreting steering vectors with SAEs to improving ghost grads.
Great visualisation library for Sparse Autoencoder features from @calsmcdougall! My team has already been finding it super useful, go check it out: lesswrong.com/posts/nAhy6Zqu…
I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team! I've been a huge fan of @GoogleColab for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.
I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team! I've been a huge fan of @GoogleColab for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.
big news for me: after 5000+ days and too many excellent colleagues to mention, I'm leaving Google. it's been a fantastic ride, and the hardest part about leaving is saying goodbye to my teammates and colleagues.
Next our series of small monthly updates from the interpretability team, including a few fun things: 1. We use do feature attribution to find features related to specific completions (following the athlete-sport association example of @NeelNanda5 )
Another small update from us, including some fun results about circuit analysis with SAEs.
Another small update from us, including some fun results about circuit analysis with SAEs.
We’re hiring for the adversarial robustness team @AnthropicAI! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)
I continue to be impressed by the work of Neel's scholars -- very excited to see what the next group will do!
I continue to be impressed by the work of Neel's scholars -- very excited to see what the next group will do!
Reflections on Qualitative Research: transformer-circuits.pub/2024/qualitati… [h/t to @ch402 for originating & driving this!]
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Soumith Chintala @soumithchintala
185K Followers 877 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingAlfredo Canziani @alfcnz
86K Followers 269 Following Musician, math lover, cook, dancer, 🏳️🌈, and an ass prof of Computer Science at New York UniversityJeremy Howard @jeremyphoward
221K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordRichard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pAnthropic @AnthropicAI
261K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.Richard Socher @RichardSocher
101K Followers 971 Following CEO @youSearchEngine Investing at @aixventuresHQ Before: Stanford Adj Prof in AI/NLP, Chief Scientist at Salesforce, MetaMindMiles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Sasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzKyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Horace He @cHHillee
23K Followers 448 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemalePercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistRosanne Liu @savvyRL
33K Followers 966 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRJack Clark @jackclarkSF
67K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresNeel Nanda @NeelNanda5
13K Followers 89 Following Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!Amanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.Ferenc Huszár @fhuszar
40K Followers 1K Following Secular Bayesian. Associate Professor in Machine Learning @Cambridge_CL. Talent aficionado at https://t.co/RbJkoLguey Alum of @Twitter, Magic Pony and @BaldertonJessie Wright @JAWrightNY
243 Followers 586 Following “To eat is a necessity, but to eat intelligently is an art.” Ops @ https://t.co/eQcLCtqkEvGuy Swann ⚡️| Act.. @TheGuySwann
80K Followers 3K Following Liberty is a technology problem • Host of @BitcoinAudible, @Ai_Unchained • Pro Memecraft • Audiobook NarratorElon Musk @elon_1_1musk
89 Followers 773 Following Entrepreneur 🚀| Spacex • CEO & CTO 🚔| Tesla • CEO and Product architect 🚄| Hyperloop • Founder 🧩| OpenAI • Co-founder 👇🏻| Build A 7-fig IG Businesslink @0xdb0bc518
0 Followers 26 FollowingValeriia Blinova @BlinovaValeriia
38 Followers 59 Following IR & NLP Researcher, PhD candidate at RMIT, AustraliaJanek Zimoch @janekzimoch
10 Followers 27 FollowingNC @NC8089603155151
13 Followers 35 FollowingWen Ni Chew @wennichew2
0 Followers 16 FollowingRazvan @NickHunt477
15 Followers 230 Following A Romanian around the world. Never seen a Kardashian's show episode. Loves cheese and video games. Fresh account. #NAFO ftw... @konsistenzz
1 Followers 108 FollowingAbas @Aylardarma2024
45 Followers 652 FollowingArabinda Dutta @arbyi
2 Followers 35 Followingalphabiz-test1 @AlphabizT57517
1 Followers 44 FollowingDiscipline Sans taboo @blockchain_226
18 Followers 290 FollowingLucas @xlucsam
14 Followers 199 FollowingDaddyXXXmas @AthomsG
12 Followers 173 FollowingGPT.Biz @gpt_biz
251 Followers 49 Following 历时一年专为国内用户打造多模态AI,【GPT-4】、【DALL·E】、【Midjourney】等全球尖端的AI技术的【非盈利】平替方案,无需VPN国内服务器直连极速访问,超低成本甚至免费使用。 顶级AI技术【文本】【图片】【视频】全覆盖,【联网搜索】【代码运行和数据分析】【PDF、图片、ZIP等附件分析】全功能支持。tuan pho @tuanpho
1 Followers 125 Following₿rain$layer @bryansayler
478 Followers 3K FollowingDigitalDrip AI Newsle.. @getdigitaldrip
15 Followers 49 Following We curate, summarize, and bring impactful and interesting AI blogs directly to your inbox. https://t.co/GoBsdQIPM3hua @jaKehua117
64 Followers 2K FollowingAnita Kay @domevampire11
4 Followers 149 FollowingBob hu @HuBob44340
2 Followers 31 Following김동현 @GguVK7y5wlgfgqP
2 Followers 138 FollowingRaghu Sharma @Raghu81403177
27 Followers 30 FollowingManjunath G @gManjuRam
1 Followers 22 FollowingYashwanth Alapati @Yash_alap
3 Followers 59 Following Grad student at NYU. MongoDB User Group(MUG leader), NYC. If you are looking to speak at a MUG event in New York, feel free to reach out.Niranjan Damera-Venka.. @ndvenkata
0 Followers 16 FollowingPontoon⑰▄⇐Surto.. @nekomata1440
181 Followers 962 FollowingArqam @arq6m
27 Followers 1K FollowingNorb Urban @norb_urban
0 Followers 122 FollowingDaniel Guppy @DanielJGuppy
1 Followers 34 FollowingDaniel @danlugogmailco1
74 Followers 176 FollowingAbdulloh @abdullohthegr8
55 Followers 906 FollowingAlec McLean @AlecJMcLean
0 Followers 17 Followingbrunsuvihyd1986 @brunsuvihy81791
10 Followers 20 FollowingArjun Majumdar @ArjunMajum88583
4 Followers 62 FollowingAnthropic @AnthropicAI
261K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.Jack Clark @jackclarkSF
67K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresOriol Vinyals @OriolVinyalsML
166K Followers 82 Following VP of Research & Deep Learning Lead, Google DeepMind. Gemini co-lead. Past: AlphaStar, AlphaFold, AlphaCode, WaveNet, seq2seq, distillation, TF.Amanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.Ferenc Huszár @fhuszar
40K Followers 1K Following Secular Bayesian. Associate Professor in Machine Learning @Cambridge_CL. Talent aficionado at https://t.co/RbJkoLguey Alum of @Twitter, Magic Pony and @BaldertonZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Lilian Weng @lilianweng
94K Followers 148 Following Working on AI safety, past on robotics, applied research @OpenAI; Writing ML blogs to help myself & others to learn; Ideas my own.Irina Rish @irinarish
9K Followers 994 Following prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head https://t.co/UzlrC7ZrGF; INCITE project PI https://t.co/0rV7szd7rH; CSO https://t.co/XDhj6MEtUjNando de Freitas 🏳.. @NandoDF
97K Followers 658 Following I research intelligence to understand what we are, and to harness it wisely. I lead a wonderfully creative AI team at @GoogleDeepMind who inspire me everyday.Catherine Olsson @catherineols
15K Followers 1K Following Hanging out with Claude, improving its behavior, and building tools to support that @AnthropicAI 😁 prev: @open_phil @googlebrain @openai (@microcovid)Joshua Batson @thebasepoint
2K Followers 707 Following trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit mathAdam Jermyn @AdamSJermyn
1K Followers 188 Following AI Interpretability & Safety @AnthropicAI. Previously at @FlatironInst @FlatironCCA, @KITP_UCSB, PhD @Cambridge_Uni, BS @Caltech.Trenton Bricken @TrentonBricken
6K Followers 2K Following Trying to figure out what makes minds and machines go "Beep Bop!" @AnthropicAIderek guy @dieworkwear
811K Followers 963 Following Menswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Washington Post, The Financial Times, Esquire, and Mr. PorterColeman Hughes @coldxman
350K Followers 904 Following Conversations w/Coleman Podcast | Forbes 30 Under 30 | Contributor at @theFP | Analyst at @CNN https://t.co/cwLQsfPK19Michael Sellitto @MPSellitto
1K Followers 2K Following @AnthropicAI, @CNASdc, @StanfordHAI Formerly: @WhiteHouse NSC 2015-2018, @NSC44, @StateDept, @ODNIgov Personal viewsThe Folio Society @foliosociety
59K Followers 3K Following We are a unique and proudly independent publisher, creating beautiful books for 75 years. Made for book lovers, by booklovers. Here to help Mon-Fri 9:30-17:30.Forecasting Research .. @Research_FRI
579 Followers 21 Following Research institute focused on developing forecasting methods to improve decision-making on high-stakes issues, led by chief scientist Philip Tetlock.Andreas Tolias Lab @AToliasLab
4K Followers 683 Following to understand intelligence and develop technologies by combining neuroscience and AIKeith Frankish @keithfrankish
30K Followers 2K Following Philosopher, writer, Ελληνοβρετανός. Hon Professor @sheffielduni. Mind, consciousness, illusionism, cog-sci, Ελλάδα. Podcast: https://t.co/kyMR0mRBqmEric Reinholdt @EricReinholdt
2K Followers 376 Following architect . entrepreneur . dad . guitar player . metal head . hiker .Tristan Hume @trishume
6K Followers 330 Following Performance optimization lead @AnthropicAI. Profiling, distributed systems, dev tools, interpretability. [email protected]Jason Matheny @JasonGMatheny
8K Followers 373 Following President and CEO of the @RANDCorporation, a nonprofit, nonpartisan research org that helps improve policy and decisionmaking through research and analysis.Emily Oster @ProfEmilyOster
109K Followers 283 Following Data-Driven Pregnancy and Parenting Economist @BrownUniversity Author #ExpectingBetter, #Cribsheet, #FamilyFirm, #TheUnexpected CEO of https://t.co/Q4hHIERBD5 👇Jacob Steinhardt @JacobSteinhardt
7K Followers 67 Following Assistant Professor of Statistics, UC BerkeleyTina White @CristinaRWhite
91 Followers 163 Following Researcher, nonprofit founder @CovidWatch. AI alignment, privacy-preserving technology, machine learning, aerodynamics. Emergent ventures grantee.John Nerst @everytstudies
4K Followers 865 Following big picture-fetishist | aspiring erisologist ("the study of disagreement and intellectual difference") | lover and hater of words/philosophy/artSimon Sarris @simonsarris
57K Followers 980 Following 🕯 In labouring to be concise, I become obscure. 🕯 Alchemist, sacred things, making things 🕯 The map is mostly water. 🌜 I make GoJS: https://t.co/7yYIMFfAtdandy jones @andy_l_jones
4K Followers 326 Following engineering & research at @AnthropicAI. DC, SF, LondonDario Amodei @Dario_Amodei
2K Followers 15 FollowingTom Brown @nottombrown
5K Followers 524 Following @AnthropicAI, GPT-3, AI alignment, robustness, etc. Cautiously optimistic.Kamal Ndousse @kandouss
2K Followers 496 Following AI @AnthropicAI Social learning enthusiast. Opinions and dumb jokes my own.Daniela Amodei @DanielaAmodei
6K Followers 300 Following President @AnthropicAI. Formerly @OpenAI, @Stripe, congressional staffer, global development@nelhage @nelhage
4K Followers 765 Following I've quit Twitter. Find me: https://t.co/e9ivqRR9JA https://t.co/oTZrAyGRU6 https://t.co/9fFULpcdVaSocial ch402 @ColahSocial
34 Followers 16 Following Social account of @ch402. Pushing myself to be genuine and vulnerable.Lulie @reasonisfun
16K Followers 486 Following Epistemology applied to everything. 💫 Host of Reason Is Fun podcast w/ @DavidDeutschOxf 🎙️ Taking critical rationalism into life – how to improve both.The White House @WhiteHouse
8.8M Followers 6 Following Welcome to the Biden-Harris White House! Tweets may be archived: https://t.co/UbZQo0sWVfLaurens Gunnarsen @MathPrinceps
1K Followers 235 Following Mathematical physicist and mentor to mathematically talented youth. Talent is that which bridges the gap between what can be taught and what must be learned.Eli Tyre @EpistemicHope
2K Followers 138 Following Trying to understand the world (my relationship to twitter: https://t.co/7UrZIBBeKS…)jennifer daniel @jenniferdaniel
15K Followers 1K Following Unicode ESC, Chair: 🥹🫠🫥🥲🫡🫢🫣😮💨😵💫😶🌫️❤️🔥❤️🩹🫦🫧🫗🪬 | Emoji Kitchen Chef 🧑🍳 | https://t.co/EYn9XPVsOCuncatherio @uncatherio
2K Followers 1K Following wholesomeness practitioner; user of words // profile pic used to look like @catherineols upside-down 🙃Kanjun 🐙🏡 @kanjun
17K Followers 487 Following understanding human & machine minds to build a creative abundant future. CEO @imbue_ai. support founders @outsetcap. co-organize https://t.co/H1aXYk96ja.David Chapman @Meaningness
31K Followers 135 Following Better ways of thinking, feeling, and acting—around problems of meaning and meaninglessness; self and society; ethics, purpose, and value.David Luan @jluan
9K Followers 1K Following led Google’s large models effort, director @googleai. former vp engineering @openai. interested in ML + society. all about type II fun.Nat.Sec.L. Podcast @NSLpodcast
7K Followers 21 Following The latest national security law debates w/ Professors @BobbyChesney & @steve_vladeck, at https://t.co/pFWVjZ6BOFBill Hilton 🇺🇦 @billhilton
2K Followers 1K Following I create stuff for adult piano learners. I'm especially interested in finding effective, low-cost ways in which older adults can develop their musical skills.Jim O’Neill @regardthefrost
5K Followers 2K Following Longevity, science, health, and peace. e/acc. Co-founder of the Thiel FellowshipSophia Sanborn @naturecomputes
4K Followers 3K Following Theory, ML, neurotechnology @ https://t.co/OmhC0RyxZp | Organizer @neur_reps | Prev: @geometric_intel @berkeley_ai @redwood_neuro @intelai @harvardNicolas Papernot @NicolasPapernot
10K Followers 665 Following Security and Privacy of Machine Learning @Uoft @VectorInst @Google 🇫🇷🇪🇺🇨🇦 Co-author https://t.co/VJF39DQPCu; @CentraleLyon + @PSUEngineering alumnus. Opinions mineLaura 🌲 ⛰️ @LauraDeming
44K Followers 194 Following I like molecules and thought experiments! personal: (notes on research + longevity) https://t.co/SoImhWr11i work: https://t.co/iVWTaFOzt2 and https://t.co/YBd4qE6jR6Danny Hernandez @Hernandez_Danny
3K Followers 545 Following Measuring and forecasting AI progress @AnthropicAI.New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
Announcing a progress update from the @GoogleDeepMind mech interp team! Inspired by @AnthropicAI's excellent monthly updates, we share a range of updates on our work on Sparse Autoencoders, from signs of life on interpreting steering vectors with SAEs to improving ghost grads.
We have a, uh, tradition of “bribing” 4yo with candy when she’s ready for school on Mondays (see QT). @Gena_I_Gorlin and I are traveling, sitters are living with kids. Sitters thought that it was candy *every* morning. Today 4yo *declined candy*, corrected and chided sitters.
The hardest part of 3yo’s week was Monday school drop off. She loves school but hates transitions, and this is a big one. After trying everything else, I resorted to bribing her with candy. I even branded it explicitly to her: the Monday Morning Bribe. Well, it, uh, “worked”…
@uncatherio FWIW I've had a lot of luck getting interior design advice from Claude. Give it a photo of my space and ask about alternatives/furnishing options.
@ArtirKel @shae_mcl like reading a book as the sun sets in your little nook pin.it/34fPVN20k
@ArtirKel @shae_mcl it just can’t compare to cozy maximalist plant-filled cabin pin.it/2ou3PjMpI
Just wanted to share some good news: After really truly almost dying of cancer, beyond the point we’d all accepted it & she was stopping treatment, my mom’s scans in Dec. showed sudden & miraculous improvement after a Hail Mary & today she heard she doesn’t have cancer anymore.
@ben_mathes @Noahpinion (or at least similarly right to restricting the overall housing supply, which I tend to fixate on as the cause of the high housing prices)
@ben_mathes @Noahpinion of course! I don't endorse that view and I agree it's reprehensible. but before reading that post, I thought "blaming the techies" was both descriptively and normatively wrong. the post suggests it's descriptively sorta right
google cloud insisting i do a sales call in order to get more than a single GPU is going to be the reason i go with another cloud provider 💀💀 another provider let me have my nice little A100 cluster just because. didn’t have to talk to anyone.
@flawedaxioms @anveio idk if not doing this this requires galaxy brain neurotypical social skills tbf.
i worry a lot that sometimes when people make these arguments it’s because they want to allow themselves to indulge one of the things they most deeply want for themselves but they can’t without feeling guilty. it’s okay to sometimes do things just because we really want to.
this fundamentally isn’t a well-reasoned position. that’s kind of the point. it is such a beautiful part of life and i want it to feel untouched by the responsibility to others that permeates most of my other choices. i want to enjoy it without any guilt.
one of the most not consequentialist takes i have is that it makes me sad when people feel the need to justify having kids as an effective choice at all. i really want a family. maybe it isn’t quantitatively justifiable. maybe by doing so, the world is somehow a net worse place
Every ethical argument for having children is dominated by other options that are more effective. 1. If you’re worried about population issues, just donate $10k to bednets That’s about the equivalent of two extra children existing in the world. It also does more good 🧵 1/
today someone asked me if i was EA and instead of explaining “well… you know, i’m a little bit adjacent but… blah blah” i actually said yes?? what does this mean??
Extremely cool work from @saprmarks! I think this is one of my favourite SAE papers since Towards Monosemanticity. I'm particularly excited about the use of error nodes, without which SAEs are a bit too janky to do reliable circuit analysis with
Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller
Use dictionary learning to find circuits that actually explain network behavior. Eg they’re able to ablate away gender bias! The whole process can also be made scalable and unsupervised. Awesome work @saprmarks et al.
Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here: anthropic.com/research/many-…