Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
New blog! We @AISecurityInst partnered with @NCSC to write about an emerging practice I'm really excited about: Safeguard Bypass Bounty Programmes (SBBPs). Summary of what these are, why they are useful, & how to do them well 🧵
Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵
🚨Open-weight AI models are becoming more powerful, now knocking on the door of today’s closed-weight frontier.
This poses critical safety challenges – how can we prevent the misuse of models whose parameters are free to download online? 🧵
327K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
42K Followers 866 FollowingFR/US/GB AI/ML Person, Director of Research at @GoogleDeepMind, Honorary Professor at @UCL_DARK, @ELLISforEurope Fellow. All posts are personal.
57K Followers 857 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
9K Followers 5K FollowingResearch in ML/NLP at the U of Edinburgh (tenured faculty @InfAtEd @EdinburghNLP), Co-Founder @Miniml_AI, @ELLISforEurope Scholar, https://t.co/5dUI3EFexo
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
18K Followers 4K FollowingAI professor.
Deep Learning, AI alignment, ethics, policy, & safety.
Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI.
AI is a really big deal.
4K Followers 197 FollowingUCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab at @AI_UCL led by @_rockt, @egrefen, @robertarail, and @jparkerholder.
4K Followers 421 FollowingCofounder & CEO @WecoAI.
Automating hill climbing with AI-Driven Exploration (AIDE).
PhD in Machine Learning @UCL_DARK.
(Zheng=j-uhng, j as in job; yao=y-aoww)
141 Followers 2K FollowingOfficial journal of China Society of Image and Graphics (CSIG). The jouarnl is published by Springer, sponsored by CSIG. E-ISSN 2731-9008.
5K Followers 2K FollowingResearch Scientist (Frontier Planning) at @GoogleDeepMind.
Research Affiliate @Cambridge_Uni @CSERCambridge & @LeverhulmeCFI.
All views my own.
5K Followers 905 FollowingFaculty at @ELLISInst_Tue & @MPI_IS, leading the AI Safety and Alignment group. PhD from @EPFL supported by Google & OpenPhil PhD fellowships.
117 Followers 416 FollowingHacking SEO as Director for https://t.co/JJSjby3RSM. Speaking on the topics in SEO, crypto, AI. A queer techie + Coffee Geek ☕️
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
42K Followers 866 FollowingFR/US/GB AI/ML Person, Director of Research at @GoogleDeepMind, Honorary Professor at @UCL_DARK, @ELLISforEurope Fellow. All posts are personal.
646K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
57K Followers 857 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
32K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
12K Followers 742 FollowingResearch Scientist, Deepmind
I try to think hard about everything I tweet, esp on 90s football and 80s music
None of my opinions are really someone else's
16K Followers 349 FollowingCSO & co-founder, Reliant AI. Ex RL research lead at Google Brain, DeepMind. Known for Atari 2600 RL benchmark, Distributional RL (MIT Press 2023).
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
74K Followers 529 FollowingWearables with brains for people with heart. Turn tiny moments of awesome into the best times ever. Tell the world how you #MakePebbleYours ❤️
3K Followers 857 FollowingHead of Alignment at the UK AI Security Institute (AISI). Semi-informed about economics, physics and governments. views my own
2K Followers 468 FollowingYak Shaver and Security Researcher. Head of Research&Development at Chainlink Labs. Formerly at 🇨🇭 ETH Zürich, 🗽 Cornell Tech,⛓️ IC3.
5K Followers 2K FollowingResearch Scientist (Frontier Planning) at @GoogleDeepMind.
Research Affiliate @Cambridge_Uni @CSERCambridge & @LeverhulmeCFI.
All views my own.
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
55 Followers 217 FollowingPhD student in Foundational AI @ucl @ai_ucl @uclcs
Enrichment Fellow @turinginst
2x ML Research Intern at Apple working on Differential Privacy
262 Followers 783 FollowingIncoming AI safety and technical AI governance DPhil @UniofOxford • MSc in AI at ETH Zurich • 2x @MATSprogram • Talos AI Governance Fellowship • 🇪🇺🇨🇿
3K Followers 1K FollowingCTO at Robust Intelligence. Formerly, Microsoft, Endgame/Elastic, Mandiant/FireEye, Sandia & MIT Lincoln Labs.
'He who forgives ends the quarrel'
426 Followers 51 Followingtrying to see through context windows
currently: agent security lead @ U.S. Center for AI Standards and Innovation
past: science of deep learning phd @ harvard
10K Followers 1K FollowingWaiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account.
Accepting ML/NLP PhD students.
41K Followers 246 FollowingProfessor of Machine Learning, University of Oxford
@OATML_Oxford Group Leader
Director of Research at the UK govt's AI Security Institute (AISI)
5K Followers 169 FollowingZOG in exile. Alpha Midwit.
The most ironic outcome is the most likely. Reducing the irony is my job.
There is no antimemetics division