🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:
Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!
Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!
Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
Short background note about relativisation in debate protocols: if we want to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. 🧵
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
Come work with me!!
I'm hiring a research manager for @AISecurityInst's Alignment Team.
You'll manage exceptional researchers tackling one of humanity’s biggest challenges.
Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks.
1/4
22K Followers 321 FollowingGlobally ranked top 20 forecaster 🎯
AI is not a normal technology. I'm working at @IAPSai to shape AI for global prosperity and human freedom.
50K Followers 3K FollowingAI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
20K Followers 9K FollowingProgramme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
6K Followers 608 FollowingClaude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
18K Followers 4K FollowingAI professor.
Deep Learning, AI alignment, ethics, policy, & safety.
Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI.
AI is a really big deal.
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
1K Followers 8K FollowingAI inference, speculative decoding, open source. Built novel decoding algorithms – default in Hugging Face Transformers (150+ ⭐). Making AI faster + cheaper
340 Followers 3K FollowingResearcher in math+formal methods+ml. Working on using formal verification to train models for mathematics and reasoning @harmonicmath
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
271 Followers 652 FollowingThinks AI risk is somewhat likely, and AI benefits huge if we can align AIs to someone that is willing to promote human thriving even when humans are useless.
402 Followers 2K Followinggetting there like the tortoise. Jesus is all, his being, his Father, his Holy Spirit. The only Rock required in the universe.
15K Followers 5K FollowingSenior AI reporter @Verge. 5+ years covering the industry's power dynamics, societal implications & the AI arms race. Previously @CNBC.
Signal: haydenfield.11
209K Followers 102 FollowingThe original AI alignment person. Understanding the reasons it's difficult since 2003.
This is my serious low-volume account. Follow @allTheYud for the rest.
10K Followers 322 FollowingOfficial Unofficial EA mascot. I'm here to make friends and maximise utility, and I'm all out of neglected altruistic opportunities
22K Followers 321 FollowingGlobally ranked top 20 forecaster 🎯
AI is not a normal technology. I'm working at @IAPSai to shape AI for global prosperity and human freedom.
50K Followers 3K FollowingAI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
32K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
20K Followers 9K FollowingProgramme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
6K Followers 608 FollowingClaude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
24K Followers 10K FollowingFormer Quant Investor, now building @lumeraprotocol
(formerly called Pastel Network) | My Open Source Projects: https://t.co/9qbOCDlaqM
25K Followers 851 FollowingThe leading presenter of Japanese cinema for more than 50 years | Classics & premieres @japansociety in NYC. Organizers of #JAPANCUTS since 2007.
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
12K Followers 184 Followingpost training co-lead at Google DeepMind, focusing on safety, alignment, post training capabilities • associate professor at UC Berkeley EECS
1K Followers 789 FollowingAssistant Professor in Psychology at Stony Brook University. I’m interested in how people interact with LLMs and they impact they might have on our psychology.
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
1K Followers 397 FollowingTowards the logic of conceptuality; the canvas of experience; the ideatic science; the end of suffering. Friend to bots and animals.
https://t.co/PYPYxGNnK4