Jan Leike @janleike
ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes. jan.leike.name San Francisco, USA Joined March 2016-
Tweets532
-
Followers44K
-
Following321
-
Likes3K
Reminder: applications for the $10M Superalignment grants close Sunday night! Grad students, academics, researchers: we’d love to work with you, we think there’s a ton of interesting research to do on generalization, scalable oversight, interpretability, and more.
Reminder: applications for the $10M Superalignment grants close Sunday night! Grad students, academics, researchers: we’d love to work with you, we think there’s a ton of interesting research to do on generalization, scalable oversight, interpretability, and more.
This is a reminder that the application deadline is in less than 2 weeks!
This is a reminder that the application deadline is in less than 2 weeks!
latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll…
latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll…
I'm hiring! I'm building 4 research groups under me at AISI (formerly the UK's Taskforce on Frontier AI) to work on foundational AI safety research. [1/5] gov.uk/government/pub…
humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts
Richard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiMiles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pJack Clark @jackclarkSF
67K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresAmanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.near @nearcyan
45K Followers 883 Following https://t.co/IdaJwZJCXm partner @ https://t.co/9g1MIgjiqc dms opentypedfemale @typedfemale
23K Followers 477 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anonStefan Schubert @StefanFSchubert
28K Followers 2K Following Philosophy, psychology, and effective altruism.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistRob Miles (✈️ Tok.. @robertskmiles
18K Followers 789 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza deliveryDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈EigenGender @EigenGender
6K Followers 659 Following all my posts are shitposts that simultaneously reveal the true nature of reality. large language models; kinda EA; 🏳️⚧️Rob Bensinger ⏹️ @robbensinger
8K Followers 302 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Ethan Caballero is bu.. @ethanCaballero
8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMindPeter Wildeford @peterwildeford
10K Followers 366 Following Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems. - Co-CEO @RethinkPriors - Chief Advisory Executive @iapsAISaumil Patel @saumilp_
2K Followers 1K Following 🚀 Co-Founder & CEO @ https://t.co/2rmaJyjkus (YC S21) | 🤖 SWE In a symbiotic relationship with AI | e/acccampbellC.dev @DevCampbellc
6 Followers 192 Following Software Engineering, Security, Recovering MathematicianJoana Iljazi @JoIliazi
53 Followers 529 FollowingAlan Deane @AlDeane
222 Followers 2K FollowingB2B_Success_With_AI @b2b_growth_ai
60 Followers 263 Following Digital Marketing || Social Media Marketing || SaaS Marketing || E-mail MarketingMassimiliano Nicotra @avvmax70
880 Followers 1K Following Hi Tech and Media Lawyer, with some novel to publish and various topic interest :-)blackchaos @__krup__
37 Followers 856 FollowingTari @Tari14918197
9 Followers 193 FollowingClaire McTaggart @McTaggartClaire
229 Followers 983 Following Founder of @SquarePegHires, a data driven hiring platform.link @0xdb0bc518
0 Followers 29 FollowingRafael Santandreu @Rafalo57
593 Followers 773 FollowingWiktoria Leks @wiki_answerz
201 Followers 1K Following Researcher in green solutions, longevity and access to GenAI tools.ColdSpringsGal @coldspringsgal
66 Followers 706 Followingjr @jamesrichmanx
16K Followers 153 FollowingXITIS @Michael42087244
472 Followers 2K Following Married. Retired 33 years Non-practising atheist. There are but three truths, yours, mine and mathematics. Ignorance is bliss? Death; no worries.Namdev Kambli @namd89465
0 Followers 27 FollowingKrueger AI Safety Lab @kasl_ai
227 Followers 47 Following We are a research group at the University of Cambridge focused on avoiding catastrophic risks from AI.10xLogisticsExperts @Logisticsexpert
16K Followers 5K Following We write about all things supply chain and logistics. Our company not only consults but executes for our clients. Sprinkled in is commentary on index investingAnuraag Gupta @anuraag2601
284 Followers 1K Following Saas Investor @ElevCap. Previously product @MicrosoftTeams, @flock, @medianetads. Alter ego lives here: https://t.co/FJpQsucK51Caleb @catethos
110 Followers 948 FollowingHumam @Humam35676679
12 Followers 411 FollowingAlexander Morosow @alex5m6
3 Followers 35 Following Head of Creative Engineering & Software Architect @refikanadol studio | @datalandmuseum | simplify omnidirectional motionMax Weinberg @maxewwell
16 Followers 187 FollowingHarshay Shah @harshays_
408 Followers 452 Following ML PhD student at MIT, advised by @aleks_madry Previously: @googleai @msftresearch @illinoiscsNikita Agarwal @niki__agarwal
57 Followers 762 FollowingMahmoud Mahfouz @Mahfouz1991
140 Followers 608 Following AI Research Lead at J.P. Morgan AI Research | part-time PhD candidate RL for Algorithmic Trading at Imperial College London. Opinions are mine not my employersAtharv Prajod Padmala.. @APadmalayam
6 Followers 74 Following Freshman at University of Wisconsin-Madison, Econ+Data Sci, Economics Researcher. Check out my website!Nirupama Ratna @ratna_kandala
181 Followers 1K Following Ph.D. student in Linguistics @ IIT Hyderabad BS-MS in Systems Biology #NLP#AI#NeuroscienceYap Frank Arnaud @ArnaudYap
48 Followers 91 Followingleomord zui @LeomordZui98557
645 Followers 3K Following We have a strong awareness and responsibility toward our company vision! We’re continuously looking for improvements in our company.RoboDepot🤖 @RoboDepot
664 Followers 2K Following Your ultimate nexus for robotics, AI, ML, CV, and future tech showcases. Follow to see the robotics revolution!🔔Mibnar Mc'Toasty @Fryzriender
155 Followers 1K Following I am a financial controller. My boss said I am instrumental at filling white space. I don't know what that means.Benjamin Stingle @marojejian
750 Followers 35 Following It's not in my nature to be mysterious, but....ashishv.eth @ashish296
20 Followers 212 Following “Build your own dreams, or someone else will hire you to build theirs.”gwangsu @gwangssu
81 Followers 1K Following software engineer, gamedev, distributed systems, craftsmanshiphua @jaKehua117
85 Followers 2K FollowingAI Central 🇿🇦 @AICentral_SA
16 Followers 80 Following Pre-AGI Research & Consulting Firm | Where Safe AI Converges | Coming Soon ✨Richard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiAnthropic @AnthropicAI
261K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.Miles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Jack Clark @jackclarkSF
67K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresAmanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.Neel Nanda @NeelNanda5
13K Followers 89 Following Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!typedfemale @typedfemale
23K Followers 477 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anonPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistRob Miles (✈️ Tok.. @robertskmiles
18K Followers 789 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza deliveryIlya Sutskever @ilyasut
370K Followers 2 Following towards a plurality of humanity loving AGIs @openaiSam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Joshua Achiam ⚗️ @jachiam0
14K Followers 945 Following Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1Kelsey Piper @KelseyTuoc
27K Followers 544 Following Senior writer at Vox's Future Perfect. [email protected]David Krueger @DavidSKrueger
13K Followers 4K Following Cambridge faculty - AI alignment, deep learning, and existential safety. Formerly Mila, FHI, DeepMind, ElementAI, AISI.Mo Bavarian @mobav0
11K Followers 916 Following Research Scientist, working on optimization and architecture of LLMs at OpenAI. Math ❤️. Prev SWE Rubrik, PhD MIT.David Pfau @pfau
22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋david rein @idavidrein
2K Followers 983 Following Sentio ergo sum. AI alignment research at NYU, early employee @cohereAI Safety Institute @AISafetyInst
528 Followers 29 Following We’re building a team of world leading talent to tackle some of the biggest challenges in AI safety - come and join us.Fidji Simo @fidjissimo
30K Followers 549 Following CEO and Chair @Instacart. Cofounder of @Metrodorainst, focused on finding cures for neuroimmune conditions.Dr. Sue Desmond-Hellm.. @SueDHellmann
60K Followers 902 Following Fan of science, running, cycling, reading, skiing, @sfgiants. Instagram @ suedesmondhellmannDan Gorelick @dqgorelick
788 Followers 583 Following musician and creative coder. @livecodenyc / @avclubsf / @recursecenter / @SFPC / @hackNYManas Joglekar @ManasJoglekar
196 Followers 242 FollowingMETR @METR_Evals
671 Followers 1 Following Model Evaluation and Threat Research (METR) works on building evaluations to empirically test whether cutting-edge AI systems could pose catastrophic risks.Yo Shavit @yonashav
4K Followers 830 Following policy for v smart things @openai. Past: CS PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Tweets my own; on my head be it.Julian Michael @_julianmichael_
1K Followers 122 Following Researching stuff @NYUDataScience. he/himanimals going goblin .. @mischiefanimals
1.4M Followers 276 Following goblin guy posting goblin goons (and whatever else I find funny)AV CLUB SAN FRANCISCO @avclubsf
255 Followers 20 Following AV Club is a San Francisco based algorave artist collective focused on live performance | IG @avclubsfDavid Bau @davidbau
3K Followers 241 Following Computer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LUZRTwEvan Hubinger @EvanHub
4K Followers 1K Following Alignment stress-testing team lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)Collin Burns @CollinBurns4
11K Followers 276 Following Superalignment @OpenAI. Formerly @berkeley_ai @Columbia. Former Rubik's Cube world record holder.Pavel Izmailov @Pavel_Izmailov
6K Followers 1K Following Incoming Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦FutureHouse @FutureHouseSF
2K Followers 3 Following Philanthropically-funded moonshot building semi-autonomous AI to accelerate the pace of scientific discovery in biology.Wei Dai @weidai11
7K Followers 82 Following wrote Crypto++, b-money, UDT. thinking about existential safety and metaphilosophy. blogging at https://t.co/mBVFhriJVfSholto Douglas @_sholtodouglas
15K Followers 856 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meterLawrence H. Summers @LHSummers
326K Followers 706 Following Charles W. Eliot University Professor and President Emeritus at Harvard. Secretary of the Treasury for President Clinton and Director of NEC for President ObamaBret Taylor @btaylor
139K Followers 2K Following Co-Founder @SierraPlatform. Board @OpenAI @Shopify.justsaysinnonsuperint.. @incurrentmodels
12 Followers 0 Following a la @justsaysinmice but for alignment researchBoaz Barak @boazbaraktcs
17K Followers 419 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.I. Yosun Chang @Yosun
4K Followers 1K Following {wonder, innovation, elegance} ∈ I turn emerging technologies into award winning apps. Ex-Hackathon pro. #3D #AR #AI since forever. Mad science and artistry ❤️Crémieux @cremieuxrecueil
88K Followers 901 Following I write about genetics, 'metrics, and demographics. Read my long-form writing at https://t.co/8hgA4nNS2A.Alex Beutel @alexbeutel
2K Followers 682 FollowingAleksander Madry @aleks_madry
31K Followers 166 Following Head of Preparedness at OpenAI and MIT faculty (on leave). Working on making AI more reliable and safe, as well as on AI having a positive impact on society.community notes viola.. @cnviolations
861K Followers 6 Following not affiliated with @x or @communitynotes | DM SubmissionsSam Rodriques @SGRodriques
4K Followers 327 Following Director and CEO at FutureHouse. Building an AI scientist. https://t.co/rQYoPOxsYoxAI @xai
997K Followers 36 FollowingAlex Gajewski @apagajewski
2K Followers 743 Following making AI markets efficient @sfcompute, prev founder @metaphorsystemsFactorio @factoriogame
47K Followers 64 Following Factorio is a game about building factories on an alien planet.Center for AI Safety @ai_risks
5K Followers 1 Following Reducing societal-scale risks from artificial intelligence through technical research and field-building.Marius Hobbhahn @MariusHobbhahn
2K Followers 994 Following Director/CEO at Apollo Research @apolloaisafety Ph.D. student of Machine Learning @PhilippHennig5; AI safety/alignmentJames Bradbury @jekbradbury
11K Followers 8K Following Compute at @AnthropicAI! Previously JAX, TPUs, and LLMs at Google, MetaMind/@SFResearch, @Stanford Linguistics, @Caixin.Deep Ganguli @dgangul1
150 Followers 196 FollowingAI Notkilleveryoneism.. @AISafetyMemes
33K Followers 796 Following Techno-optimist, but AGI is not like the other technologies. Step 1: make memes. Step 2: ??? Step 3: lower p(doom)Katherine Lee @katherine1ee
6K Followers 932 Following understanding ourselves and our models. senior research scientist @GoogleBrain, @genlawcenter and @CornellCIS, formerly @Princeton @[email protected]Toby Ord @tobyordoxford
17K Followers 138 Following Senior Researcher at Oxford University. Author — The Precipice: Existential Risk and the Future of Humanity.Summer Yue @summeryue0
1K Followers 215 Following Director of Safety and Standards at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design)Collective Intelligen.. @collect_intel
3K Followers 50 Following collective intelligence for collective progress.New @GoogleDeepMind MechInterp work! We introduce Gated SAEs, a Pareto improvement over existing sparse autoencoders. They find equally good reconstructions with around half as many firing features, while maintaining interpretability (CI 0-13% improvement). Joint w/ @ArthurConmy
This result is pretty clearly specific to the style of backdoor we're working with, and doesn't support broad claims like 'interpretability solves misalignment', but it's still surprisingly strong. Worth a look!
New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
We are looking for an AGI Safety Manager to support @GoogleDeepMind 's AGI Safety Council: please encourage excellent people to apply! This role will work closely with my team, Scalable Alignment and Safety, and Responsible Development and Innovation. boards.greenhouse.io/deepmind/jobs/…
Some of our first steps on developing mitigations for sleeper agents
New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
factorio 2 is coming out soon. if you work in frontier model research at open ai, anthropic, or deepmind and would like a free copy, I would be very happy to buy you one! please feel free to reach out. people don't do enough for you guys
@ilex_ulmus if we can align it, then building ASI is good if we can't align it, then building ASI is bad
🤖🥇🤖
Are LLMs biased toward themselves? Frontier LLMs give higher scores to their own outputs in self-eval. We find evidence that this bias is caused by LLM's ability to recognize their own outputs This could interfere with safety techniques like reward modeling & constitutional AI
@janleike It's been nearly 4 month since the release of the "Weak-to-strong generalization" paper.Could your team please release some recent findings for controlling ASI? Research papers with statistics and results would be much appreciated.
I got ~75% on a subset of MATH so it's basically as good as me at math.
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: github.com/openai/simple-…
OpenAI called for ‘the best researchers and engineers in the world to meet the [superalignment] challenge’, very proud that my spouse Kristen Menou’s ideas got funded (1 of the 50 out of 2700!) #AIsafety.
The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!
Our research on easy-to-hard generalization will be supported by the OpenAI Superalignment Fast Grant. Congratulations to the team and stay tuned!😎
🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟 arxiv.org/abs/2403.09472 How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)
twitter just told me that they've literally shadow banned me (reducing exposure of my posts) as punishment for not engaging enough with the platform I don't expect many people to see this...
Just issued ~$10M in superalignment fast grants:
Some statistics on the superalignment fast grants: We funded 50 out of ~2,700 applications, awarding a total of $9,895,000. Median grant size: $150k Average grant size: $198k Smallest grant size: $50k Largest grant size: $500k Grantees: Universities: $5.7m (22) Graduate…
Some cool stuff is coming, stay tuned =)
The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!
Sometimes when I’m mildly stressed, my mom helps me schedule doctor’s appointments that I'd otherwise drop to keep up w my health, and I feel like it’s one of the kindest things / most thoughtful ways to show care I’ve received Love you mom <3
“What are human values, and how do we align to them?” Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @OpenAI. 📝: meaningalignment.org/values-and-ali…
I've left OpenAI. I'm mostly taking some time to rest. But I also have a few projects in the oven 🧑🍳 Here's one that I'm really excited about: we have a 🚨new paper🚨 out on aligning AI with human values, with the folk at @meaningaligned!! 😊✨🎉 Why I think it's cool: 🧵
“What are human values, and how do we align to them?” Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @OpenAI. 📝: meaningalignment.org/values-and-ali…
The @GoogleDeepMind alignment team is hiring, apply now! Deadline in 2 days (12pm PST Wed). London & the Bay I think AGI alignment is one of the most important problems today, and I'm privileged to work with brilliant colleagues every day on it. I'm excited to meet my new ones!