Rohin Shah @rohinmshah

Research Scientist at DeepMind. I publish the Alignment Newsletter. rohinshah.com London, UK Joined October 2017

Tweets

312
Followers

5K
Following

89
Likes

303

Rohin Shah @rohinmshah

4 days ago

Rose: The idea is extremely simple and well-motivated, and the effect sizes are large. Thorn: p=0.05 :( (Tbc, I am very confident we would have reached statistical significance for Gated SAEs being more interpretable, if we had a large enough N.) x.com/sen_r/status/1…

Senthooran Rajamanoharan @sen_r

5 days ago

5 24 158 20K 86

Download Image

0 0 11 2K 2

Rohin Shah @rohinmshah

a month ago

I loved working with Anca during my PhD, and now I get to do it again! Though there is one downside, she's going to notice how much more lax I've become about really nailing my talks and figures 😅 x.com/ancadianadraga…

Anca Dragan @ancadianadragan

a month ago

31 40 607 160K 44

1 2 67 8K 6

Rohin Shah @rohinmshah

a month ago

Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains. x.com/tshevl/status/…

Toby Shevlane @tshevl

a month ago

7 45 225 53K 123

Download Image

0 12 87 8K 22

Rohin Shah @rohinmshah

2 months ago

To estimate impact of various parts of a network on observed behavior, by default you need a few forward passes *per part* -- very expensive. But it turns out you can efficiently approximate this with a few forward passes in total! x.com/janoskramar/st…

János Kramár @JanosKramar

2 months ago

3 32 155 31K 95

Download Image

1 2 26 3K 8

Neel Nanda @NeelNanda5

4 months ago

My first @GoogleDeepMind project: How do LLMs recall facts? Early MLP layers act as a lookup table, with significant superposition! They recognise entities and produce their attributes as directions. We suggest viewing fact recall as a black box making "multi-token embeddings”

7 154 1K 128K 868

Download Image

Zac Kenton @ZacKenton1

4 months ago

In our new @GoogleDeepMind paper, we redteam methods that aim to discover latent knowledge through unsupervised learning from LLM activation data. TL;DR: Existing methods can be easily distracted by other salient features in the prompt. arxiv.org/abs/2312.10029 🧵👇

5 35 237 48K 145

Download Image

Rohin Shah @rohinmshah

8 months ago

Really excited that this work is finally out! x.com/vikrantvarma_/…

Vikrant Varma @VikrantVarma_

8 months ago

Really excited that this work is finally out! x.com/vikrantvarma_/…

14 201 1K 124K 663

Download Image

1 10 29 8K 10

Rohin Shah @rohinmshah

10 months ago

There's nothing like delving deep into model internals for a specific behavior for understanding how neural nets are simultaneously extremely structured and extremely messy. x.com/lieberum_t/sta…

Tom Lieberum @lieberum_t

10 months ago

There's nothing like delving deep into model internals for a specific behavior for understanding how neural nets are simultaneously extremely structured and extremely messy. x.com/lieberum_t/sta…

3 43 225 72K 97

Download Image

0 1 15 3K 5

Rohin Shah @rohinmshah

11 months ago

I really enjoyed recording this podcast -- it's very different from my previous podcasts, much more focused on opinions and impressions, rather than specific technical points. x.com/robertwiblin/s…

Robert Wiblin @robertwiblin

11 months ago

I really enjoyed recording this podcast -- it's very different from my previous podcasts, much more focused on opinions and impressions, rather than specific technical points. x.com/robertwiblin/s…

0 15 93 24K 25

2 2 38 10K 3

Rohin Shah @rohinmshah

11 months ago

We're hiring again, just like last year! Apply here: Research Scientist: boards.greenhouse.io/deepmind/jobs/… Research Engineer: boards.greenhouse.io/deepmind/jobs/… x.com/rohinmshah/sta…

Rohin Shah @rohinmshah

2 years ago

We're hiring again, just like last year! Apply here: Research Scientist: boards.greenhouse.io/deepmind/jobs/… Research Engineer: boards.greenhouse.io/deepmind/jobs/… x.com/rohinmshah/sta…

6 50 482 0 87

2 17 93 26K 22

Andreea Bobu @andreea7b

a year ago

How can we learn one foundation model for HRI that generalizes across different human rewards as the task, preference, or context changes? Come see at #HRI2023 in the Thursday 13:30 session! Paper: dl.acm.org/doi/10.1145/35… w/ Yi Liu, @rohinmshah, @daniel_s_brown, @ancadianadragan

0 4 38 13K 10

Download Image

Rohin Shah @rohinmshah

2 years ago

BASALT is running again -- and this time there's a pretrained Minecraft model for you to finetune!

MineRL Project @minerl_official

2 years ago

BASALT is running again -- and this time there's a pretrained Minecraft model for you to finetune!

1 13 65 0 6

Download Video

0 1 18 0 1

Rohin Shah @rohinmshah

2 years ago

[AN #173] Recent language model results from DeepMind - mailchi.mp/c7bf1e091608/a…

0 1 15 0 1

Rohin Shah @rohinmshah

2 years ago

[AN #172] Sorry for the long hiatus! I'll restart in the near future; for now have a bunch of news - mailchi.mp/56689cc2223c/a…

0 2 18 0 1

Richard Ngo @RichardMCNgo

35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openai

Julian @mealreplacer

16K Followers 1K Following AI safety

Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord.

Music, movies, microcode, and high-speed pizza delivery

Rob Miles (✈️ Tok.. @robertskmiles

18K Followers 789 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza delivery

Nathan 🔍 @NathanpmYoung

15K Followers 3K Following Will bet $10 on any statement I make.

Rob Bensinger ⏹️ @robbensinger

8K Followers 302 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.

Stefan Schubert @StefanFSchubert

28K Followers 2K Following Philosophy, psychology, and effective altruism.

Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems.

- Co-CEO @RethinkPriors
- Chief Advisory Executive @iapsAI

Peter Wildeford @peterwildeford

10K Followers 367 Following Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems. - Co-CEO @RethinkPriors - Chief Advisory Executive @iapsAI

Miles Brundage @Miles_Brundage

43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.

Philosopher & ethicist teaching models to be good @AnthropicAI.
Personal account. All opinions come from my training data.

Amanda Askell @AmandaAskell

26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.

David Krueger @DavidSKrueger

13K Followers 4K Following Cambridge faculty - AI alignment, deep learning, and existential safety. Formerly Mila, FHI, DeepMind, ElementAI, AISI.

@AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Jack Clark @jackclarkSF

68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Robert Wiblin @robertwiblin

34K Followers 643 Following Exploring the inviolate sphere of ideas one interview at a time: https://t.co/2YMw00bkIQ

Jan Leike @janleike

44K Followers 322 Following ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes.

Robert Long @rgblong

6K Followers 974 Following AI consciousness

Sam Bowman @sleepinyourhat

35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

Daniel Eth (yes, Eth .. @daniel_271828

7K Followers 788 Following AI alignment & memes | "known for his humorous and insightful tweets" - Bing/GPT-4 | prev: @FHIOxford

Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death

davidad 🎇 @davidad

13K Followers 7K Following Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death

Matthew Barnett @MatthewJBar

4K Followers 285 Following I share things. Married to @natalia__coelho

Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1

Joshua Achiam ⚗️ @jachiam0

14K Followers 948 Following Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1

near @nearcyan

45K Followers 882 Following https://t.co/IdaJwZJCXm partner @ https://t.co/9g1MIgjiqc dms open

I07XNbUI4 @DeepFeed2

48 Followers 3K Following

deepspaceblack @deepspaceblack

55 Followers 186 Following well-being maximizer, curious about the AI era. remember that you are not this next thought

Maheep Chaudhary | �.. @ChaudharyMaheep

41 Followers 508 Following MS @NTU || Collab w/ Stanford || Ex-MIT Driverless, UIUC.

beeple @beeple33

40 Followers 4K Following 123456789

Thinking about the future of innovation, cyber, AI, compute and the internet l Responsible tech, online safety, social justice & geopolitics l him/he💜💙🏳️‍🌈

Max Beverton-Palmer @Maxjb

1K Followers 3K Following Thinking about the future of innovation, cyber, AI, compute and the internet l Responsible tech, online safety, social justice & geopolitics l him/he💜💙🏳️‍🌈

Monte @montemacd

4 Followers 13 Following Alignment researcher at @AnthropicAI

Joel Burget @joel_burget

911 Followers 758 Following Boundedly rational loss minimizer

Builder@Infohunt.ai,Your Most Reliable Discovery AI Engine 👉 Click to explore: https://t.co/WkjTFNHdCr

Ian @ InfoHunt.ai @Ianyan2023

33 Followers 231 Following [email protected],Your Most Reliable Discovery AI Engine 👉 Click to explore: https://t.co/WkjTFNHdCr

Chris Liu @chrisliu298

2 Followers 214 Following PhD student @BaskinEng @ucsc

Visiting Ph.D. student at Cornell University. Ph.D. candidate at CUHK. Working on bandits and reinforcement learning theory.

Zhiyong Wang @Zhiyong16403503

409 Followers 2K Following Visiting Ph.D. student at Cornell University. Ph.D. candidate at CUHK. Working on bandits and reinforcement learning theory.

Sonakshi Chauhan @ChauhanSon8200

12 Followers 36 Following

Mateusz @Mantos77

37 Followers 413 Following

Connor Kissane @Connor_Kissane

49 Followers 40 Following Mechanistic Interpretability research / software engineering

Nikita Agarwal @niki__agarwal

58 Followers 762 Following

Hi I'm Rais. I'm mainly focussing on Math and Science lifelong. There is a lot to discover in these fields and my mind is always blown by all the cool things.

Rais Latif @RaisLatif_Study

39 Followers 5K Following Hi I'm Rais. I'm mainly focussing on Math and Science lifelong. There is a lot to discover in these fields and my mind is always blown by all the cool things.

. @j_2789

51 Followers 93 Following .

Dana Mahmood @deordered

22 Followers 720 Following Fine-tuning AI models oftentimes & practicing philosopher at other times.

Research Officer at Faculty of Economics, The University of Tokyo. Keywords: Entrepreneur/OpenAI/Quantum/Crypto/Analytics/Consulting. Views are my own. ⁡

𝕋𝕒𝕥𝕤𝕦�.. @tatsuru_kikuchi

365 Followers 3K Following Research Officer at Faculty of Economics, The University of Tokyo. Keywords: Entrepreneur/OpenAI/Quantum/Crypto/Analytics/Consulting. Views are my own. ⁡

Ethan @EthanKosakHine

297 Followers 373 Following

Tweeting about #machinelearning in #genomics | interpretable ML models | transcription factors | cis-regulatory elements | motifs | gene regulation

black_box @blackbox_pi

Enthusiast, angle investor, worked as Change agent in Finance services and currently working with Payments fintech as Product Lead

YuvaRaj @YuvaAnandan

2 Followers 37 Following Enthusiast, angle investor, worked as Change agent in Finance services and currently working with Payments fintech as Product Lead

Sri Mahaguhan @SriMahaguhan

32 Followers 188 Following

Evander Hammer @evander_hammer

2 Followers 32 Following

Caleb Talley @calebtalley2024

2 Followers 483 Following

Gabriel Antunes @antunesrgabriel

11 Followers 264 Following AI alignment | EA | 🌱

Utkarsh Rai @Utkarsh50755661

155 Followers 5K Following Normal

tuan pho @tuanpho

1 Followers 127 Following

Chris Keesey @c2keesey

34 Followers 117 Following

Eric Aboussouan @eric3532

99 Followers 577 Following Predicting next world at @GoogleAI Digital nomad, sailor, scientist, inventor

Henrietta.SolidGoldMa.. @Bearly_Present

180 Followers 3K Following All integers between 0 and 1 exclusive.

Sami Jawhar @CyberMonkSam

48 Followers 54 Following Neurotechnologist, serial entrepreneur, digital nomad, builder of things, wannabe philosopher

Seliem @seliemels

2 Followers 29 Following Ethics Foresight & Policy @googledeepmind, PhDing at @univienna

Deetah @Deetah71

1K Followers 336 Following midtable visionary Steve Parish fan account

Daniel Guppy @DanielJGuppy

3 Followers 34 Following

@cflorenzano @cflorenzano

2K Followers 7K Following Me arrastró el río

Dylance @pretty14130158

7 Followers 169 Following

wo1v @iamwo1v

101 Followers 845 Following EE / CS / Econ

Mandark @FutureMandark

3 Followers 13 Following Ahaha Ahahhahha hahah

Md Zunaid @MdZunaid382783

58 Followers 632 Following

teoyjz @teoyjz

27 Followers 72 Following

DailyHealthcareAI @aipulserx

43 Followers 333 Following 🚀 Daily AI healthcare updates compiled from 100+ sources (and growing)

rathink1 @rathink11

0 Followers 152 Following

Co-Survivor • Business Development Manager • Battalion Chief of EMS (Retired) • Aspiring Screenwriter • Citizen of U.S., Canada, Ireland • ECGs • YouTube 👇🏻

Tom Bouthillet 🇺�.. @tbouthillet

8K Followers 4K Following Co-Survivor • Business Development Manager • Battalion Chief of EMS (Retired) • Aspiring Screenwriter • Citizen of U.S., Canada, Ireland • ECGs • YouTube 👇🏻

Junjie (Jorji) Chen @coderchen01

2 Followers 282 Following

Aaditya ; @Aaditya26082004

531 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈

Cerdwin @CerdwinG

5 Followers 266 Following

Ittseta @IssEossda

80 Followers 674 Following

Joel Becker @joel_bkr

2K Followers 2K Following move fast and fix things. 'soccer'-me @MessiSeconds.

hell0garten @MGaseltine

98 Followers 385 Following falling through forever

natalie @nnatalieho

20 Followers 129 Following Math @Wellesley

Richard Ngo @RichardMCNgo

35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openai

Miles Brundage @Miles_Brundage

43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.

Jack Clark @jackclarkSF

68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Jan Leike @janleike

44K Followers 322 Following ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes.

Joshua Achiam ⚗️ @jachiam0

14K Followers 948 Following Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1

Hanging out with Claude, improving its behavior, and building tools to support that @AnthropicAI 😁

prev: @open_phil @googlebrain @openai (@microcovid)

Catherine Olsson @catherineols

15K Followers 1K Following Hanging out with Claude, improving its behavior, and building tools to support that @AnthropicAI 😁 prev: @open_phil @googlebrain @openai (@microcovid)

Thinking about whether AI will destroy the world at https://t.co/pMilDvd4ya. DM or email for media requests. Feedback: https://t.co/zGAm1i7SKH

Katja Grace 🔍 @KatjaGrace

8K Followers 798 Following Thinking about whether AI will destroy the world at https://t.co/pMilDvd4ya. DM or email for media requests. Feedback: https://t.co/zGAm1i7SKH

Reverse engineering neural networks at @AnthropicAI. DMs open! Previously @distillpub, OpenAI Clarity Team, Google Brain. Personal account.

Chris Olah @ch402

91K Followers 173 Following Reverse engineering neural networks at @AnthropicAI. DMs open! Previously @distillpub, OpenAI Clarity Team, Google Brain. Personal account.

Vikrant Varma @VikrantVarma_

566 Followers 22 Following Research Engineer working on AI alignment at DeepMind.

Lynette Bye @lynette_bye

36 Followers 87 Following

Rachel Freedman @FreedmanRach

955 Followers 225 Following RLHF, LLMS, interpretability & safety | PhD researcher @berkeley_ai | Visiting researcher @Cambridge_Uni

MineRL Project @minerl_official

932 Followers 29 Following Official Twitter page for the MineRL Project. Account run by @steph_milani. The BASALT 2022 competition has ended!

Author of The Alignment Problem, Algorithms to Live By (w. Tom Griffiths), and The Most Human Human. Researcher at UC Berkeley & the University of Oxford.

Brian Christian @brianchristian

4K Followers 411 Following Author of The Alignment Problem, Algorithms to Live By (w. Tom Griffiths), and The Most Human Human. Researcher at UC Berkeley & the University of Oxford.

PhD candidate at @CHAI_Berkeley. Interested in making neural networks transparent and ushering in an era of human-friendly superintelligence. Podcast: https://t.co/gM752xZcF5

Daniel Filan research.. @dfrsrchtwts

1K Followers 115 Following PhD candidate at @CHAI_Berkeley. Interested in making neural networks transparent and ushering in an era of human-friendly superintelligence. Podcast: https://t.co/gM752xZcF5

machine learning research engineer; lover of cats, languages, and elegant systems; explorer & explainer at https://t.co/HQSQCz3Tg3…

Cody Wild @decodyng

2K Followers 131 Following machine learning research engineer; lover of cats, languages, and elegant systems; explorer & explainer at https://t.co/HQSQCz3Tg3…

𝕏iaoHu Zhu⏹️cs.. @neil_csagi

756 Followers 2K Following eXistential Hope native. https://t.co/pX9vqSzWEq | @Foresightinst Fellow in Safe AGI | @FLIxrisk Affiliate

Sergey Levine @svlevine

80K Followers 122 Following Associate Professor at UC Berkeley Co-founder, Physical Intelligence

Alex Irpan @AlexIrpan

2K Followers 31 Following Research Scientist @ Google DeepMind. Working on Robotics. Has a blog. Views are my own. "Adversarially disengaging Twitter profile"

$Assoc. Prof in ML @UniofOxford @StAnnesCollege @FLAIR_Ox, dad. Ex: {RS @MetaAI, (A)PM @Google, DivStrat @GS}, ex intern {@GoogleDeepmind, @GoogleBrain, @OpenAI}$

Jakob Foerster @j_foerst

14K Followers 820 Following Assoc. Prof in ML @UniofOxford @StAnnesCollege @FLAIR_Ox, dad. Ex: {RS @MetaAI, (A)PM @Google, DivStrat @GS}, ex intern {@GoogleDeepmind, @GoogleBrain, @OpenAI}

Geoffrey Hinton @geoffreyhinton

338K Followers 28 Following deep learning

Director, Global AI Governance, TFS // AI Policy Expert @ The World Bank & OECD // Columbia + Harvard // Poetess // #twinsintech + cat mom. Views are my own.

Yolanda Lannquist @YolandaLannqist

892 Followers 567 Following Director, Global AI Governance, TFS // AI Policy Expert @ The World Bank & OECD // Columbia + Harvard // Poetess // #twinsintech + cat mom. Views are my own.

Author of Rules for a Flat World. Law and economics professor exploring the legal innovation needed to keep up with 21st century technology and globalization.

Gillian Hadfield @ghadfield

5K Followers 710 Following Author of Rules for a Flat World. Law and economics professor exploring the legal innovation needed to keep up with 21st century technology and globalization.

CHAI is a multi-institute research organization based out of UC Berkeley that focuses on foundational research for AI technical safety.

Center for Human-Comp.. @CHAI_Berkeley

3K Followers 108 Following CHAI is a multi-institute research organization based out of UC Berkeley that focuses on foundational research for AI technical safety.

tanaya @tttttanaya

44 Followers 209 Following she/her

Rosie @RosieCampbell

6K Followers 870 Following Forever expanding my nerd/bimbo Pareto frontier. Policy Frontiers team lead @OpenAI.

CEO @FARAIResearch non-profit | PhD from @berkeley_ai | Value learning, adversarial examples & robustness for deep RL | @AdamGleave@sigmoid.social

Adam Gleave @ARGleave

2K Followers 322 Following CEO @FARAIResearch non-profit | PhD from @berkeley_ai | Value learning, adversarial examples & robustness for deep RL | @[email protected]

Anca Dragan @ancadianadragan

8K Followers 178 Following AI safety & alignment at Google DeepMind • associate professor at UC Berkeley EECS • proud mom of an amazing 2yr old

Jeffrey Ding @jjding99

8K Followers 455 Following Assistant Professor at George Washington University @GWtweets | technology and int'l politics | newsletter on China's AI landscape: https://t.co/ciqWZF1jiV

How do we keep advanced artificial agents from forcefully intervening in the protocols by which we attempt to communicate what they should accomplish?

AI Safety @AI_Safety

99 Followers 32 Following How do we keep advanced artificial agents from forcefully intervening in the protocols by which we attempt to communicate what they should accomplish?

Sumith Kulal @sumith1896

833 Followers 392 Following

noah gundotra @ngundotra

4K Followers 4K Following eng at @SolanaLabs | ex-@goldmansachs | UC Berkeley alumni | opinions my own

Good Ventures @GoodVentures

6K Followers 67 Following Good Ventures is a philanthropic foundation whose mission is to help humanity thrive.

Nate Soares ⏹️ @So8res

7K Followers 72 Following

Open Philanthropy @open_phil

15K Followers 17 Following Open Philanthropy's mission is to help others as much as we can with the resources available to us.

Katyanna Quach @katyanna_q

2K Followers 822 Following Tech reporter @semafor, interested in AI and science 🤖 | previously @theregister

Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)

Jeff Dean (@🏡) @JeffDean

296K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)

Ilya Sutskever @ilyasut

370K Followers 2 Following towards a plurality of humanity loving AGIs @openai

Greg Brockman @gdb

667K Followers 51 Following President & Co-Founder @OpenAI

Terah Lyons @terahlyons

6K Followers 3K Following Not really active here. Affiliate Fellow @StanfordHAI. Former @PartnershipAI, Obama White House @WHOSTP44/@USCTO44.

Tara Mac Aulay @Tara_MacAulay

5K Followers 570 Following Cynical ex-aid worker turned crypto trader. Idealist at heart, realist by profession.

Tweeting interesting deep learning papers from each arXiv release. Powered by a neural network trained on @Miles_Brundage tweets. Created by @amaub.

Brundage Bot @BrundageBot

4K Followers 1 Following Tweeting interesting deep learning papers from each arXiv release. Powered by a neural network trained on @Miles_Brundage tweets. Created by @amaub.

David Roodman @davidroodman

9K Followers 354 Following Senior advisor @open_phil. Formerly @GiveWell, @CGDev. Views expressed solely my own.

At Animal Charity Evaluators, we find and promote the most effective ways to #helpanimals. We use #effectivealtruism principles to evaluate causes and research.

Animal Charity Evalua.. @AnimalCharityEv

5K Followers 484 Following At Animal Charity Evaluators, we find and promote the most effective ways to #helpanimals. We use #effectivealtruism principles to evaluate causes and research.

Ben Kuhn @benskuhn

7K Followers 289 Following Care a lot and try hard • making language models safer @AnthropicAI • prev CTO @WaveSenegal 🐧❤️

Lewis Bollard @Lewis_Bollard

7K Followers 844 Following Farm Animal Welfare Program Officer @open_phil. Views are my own. For more, sign up to my research newsletter: https://t.co/5DX9QNKo3R

Roxanne Heston @RoxanneHeston

1K Followers 543 Following Co-Founder & Director of Programs and Strategy https://t.co/n0WCPATLNb

Sociologist and statistician @UChicago @SentienceInst researching human-AI interaction, ML, etc. Trying to build a better future for all sentient life.

Jacy Reese Anthis @jacyanthis

29K Followers 721 Following Sociologist and statistician @UChicago @SentienceInst researching human-AI interaction, ML, etc. Trying to build a better future for all sentient life.

William MacAskill @willmacaskill

64K Followers 1K Following Moral philosopher at Oxford. Author of Doing Good Better and What We Owe The Future.

Helen Toner @hlntnr

21K Followers 1K Following Interests: China+ML, natsec+tech, brains+words+absurdity | Current: @CSETGeorgetown (opinions my own) | Former: @open_phil

AI could get really powerful soon and I worry we're underprepared. Analysis+grantmaking in AI alignment @open_phil (views my own), editor+writer @plannedobs.

Ajeya Cotra @ajeya_cotra

6K Followers 286 Following AI could get really powerful soon and I worry we're underprepared. Analysis+grantmaking in AI alignment @open_phil (views my own), editor+writer @plannedobs.

PhD student at Stanford AI Lab, supervised by Stefano Ermon. Hopefully making AI benefit humanity. Anonymous feedback: https://t.co/Wh3rHMsRnm

Chris Cundy @ChrisCundy

1K Followers 194 Following PhD student at Stanford AI Lab, supervised by Stefano Ermon. Hopefully making AI benefit humanity. Anonymous feedback: https://t.co/Wh3rHMsRnm

Sam Altman @sama

2.8M Followers 892 Following AI is cool i guess

GiveWell @GiveWell

27K Followers 137 Following We find outstanding charities and publish the full details of our analysis to help donors decide where to give.

Professor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.

Yann LeCun @ylecun

712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.

Senthooran Rajamanoharan @sen_r

5 days ago

New @GoogleDeepMind MechInterp work! We introduce Gated SAEs, a Pareto improvement over existing sparse autoencoders. They find equally good reconstructions with around half as many firing features, while maintaining interpretability (CI 0-13% improvement). Joint w/ @ArthurConmy

5 24 158 20K 86

Download Image

Neel Nanda @NeelNanda5

2 weeks ago

Announcing a progress update from the @GoogleDeepMind mech interp team! Inspired by @AnthropicAI's excellent monthly updates, we share a range of updates on our work on Sparse Autoencoders, from signs of life on interpreting steering vectors with SAEs to improving ghost grads.

4 40 378 31K 200

Download Image

Iason Gabriel @IasonGabriel

2 weeks ago

1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI? Our new paper explores these questions: storage.googleapis.com/deepmind-media… It’s the result of a one year research collaboration involving 50+ researchers… a🧵

24 196 588 179K 484

Download Image

Anca Dragan @ancadianadragan

a month ago

we're hiring: boards.greenhouse.io/deepmind/jobs/…

2 5 29 22K 24

Anca Dragan @ancadianadragan

a month ago

@GoogleDeepMind I am so happy I get to work with @rohinmshah again!

2 1 37 6K 0

Séb Krier @sebkrier

a month ago

Great new Google DeepMind paper on evaluating frontier models for dangerous capabilities. The evals cover four areas: persuasion and deception; cyber-security; self-proliferation; and self-reasoning. They test agents, comprised of the model + scaffolding. arxiv.org/abs/2403.13793

9 42 273 27K 156

Download Image

Toby Shevlane @tshevl

a month ago

In 2024, the AI community will develop more capable AI systems than ever before. How do we know what new risks to protect against, and what the stakes are? Our research team at @GoogleDeepMind built a set of evaluations to measure potentially dangerous capabilities: 🧵

7 45 225 53K 123

Download Image

Lynette Bye @lynette_bye

a month ago

Imagine if employees expected to get all their work for the week done in the one hour weekly meeting with their supervisor? Ridiculous, of course. Most of the work gets done outside of that meeting. Same deal with coaching or therapy.

0 0 2 33 0

Neel Nanda @NeelNanda5

2 months ago

@Algon_33 @rohinmshah Arbitrarily small, all of them, two forwards and one backwards neelnanda.io/mechanistic-in…

0 0 2 86 0

sanjana @sanjanamusic

3 months ago

Where Do I Go Live, from my EP, From Afar, is out now! I’ve always wanted to play this song stripped back with just guitars to highlight the intention of the song. Who do we become in the face of loss? 🫂 youtu.be/WFm15yHCZQ4?fe…

0 0 4 112 0

Download Video

Neel Nanda @NeelNanda5

4 months ago

7 154 1K 128K 868

Download Image

Zac Kenton @ZacKenton1

4 months ago

5 35 237 48K 145

Download Image

Neel Nanda @NeelNanda5

4 months ago

Cool work from @GoogleDeepMind alignment on limitations of methods for eliciting a model's beliefs! My key takeaway is that unsupervised methods (eg CCS) rely on "proxy properties" of true beliefs, but other features share these proxies! Eg "agrees with the user" vs "is true"

Zac Kenton @ZacKenton1

4 months ago

5 35 237 48K 145

Download Image

2 10 92 16K 38

Download Image

Remi Cadene @RemiCadene

8 months ago

Very nice study of "grokking" from @vkrntv @rohinmshah and colleagues at @GoogleDeepMind "Grokking" is a rapid switch of strategy from perfect memorization (low training loss) to accurate generalization (low testing loss) 🤯 After decades of advancements in the field of neural…

Vikrant Varma @VikrantVarma_

8 months ago

Our latest paper (arxiv.org/abs/2309.02390) provides a general theory explaining when and why grokking (aka delayed generalisation) occurs – a theory so precise that we can predict hyperparameters that lead to partial grokking, and design interventions that reverse grokking! 🧵👇

14 201 1K 124K 663

Download Image

0 0 7 2K 1

Zac Kenton @ZacKenton1

8 months ago

We come up with a theory of why grokking (delayed generalisation) occurs, leading to new empirical predictions of semi-grokking and ungrokking phenomena! Great work led by @VikrantVarma_ and @rohinmshah, which I'm proud to have contributed to! 🤩

Vikrant Varma @VikrantVarma_

8 months ago

14 201 1K 124K 663

Download Image

0 3 35 2K 5

Vikrant Varma @VikrantVarma_

8 months ago

14 201 1K 124K 663

Download Image

Lynette Bye @lynette_bye

8 months ago

What was I getting wrong about deliberate practice? lynettebye.com/blog/2023/7/27…

0 0 1 36 0

Lynette Bye @lynette_bye

9 months ago

I call for a trial by Google!

0 0 1 27 0

Robert Wiblin @robertwiblin

9 months ago

'The 80k Podcast on AI' is a compilation of interviews that would teach someone an insane amount about AI's promise and risks and what to do about it. (Yes I'm biased, but still it's true.) Here's the 11 episodes that made the cut: 🧵 80000hours.org/podcast/on-art…

1 8 19 4K 6

Nova DasSarma @dropbella

9 months ago

A taster's selection on AI. Especially appreciate @rohinmshah 's insight in episode 4 on the messiness of bringing technical solutions to a broad multilateral community. 80000hours.org/podcast/on-art…