🌀New work: Era of Real-World Human Interaction 🌀
📝: arxiv.org/abs/2509.25137
- RL *directly* from User Conversations
- Organic replies + long-term history are learning signal
- Trained on WildChat, beats RLHF at *user* level
-> the future for personal Super Intelligence?
🧵1/6
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing.
As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing.
As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…
It was great to have @kchonyc Kyunghyun Cho visit us in Columbia University and talk about metrics and evaluation. We should rethink our benchmark and the eval metrics, don't just follow other popular papers!
We offered a minimal working explanation on a similar phenomenon: low-recall reference models structurally limit alignment, no matter how strong the preference signal!
Check details here: x.com/_sungmin_cha/s…
We offered a minimal working explanation on a similar phenomenon: low-recall reference models structurally limit alignment, no matter how strong the preference signal!
Check details here: x.com/_sungmin_cha/s…
This resonates with our findings! Once a model’s recall is limited, alignment signals can’t fully align it. Larger, better-pretrained models preserve recall — making downstream RL much more effective!
Check more details here: x.com/_sungmin_cha/s…
This resonates with our findings! Once a model’s recall is limited, alignment signals can’t fully align it. Larger, better-pretrained models preserve recall — making downstream RL much more effective!
Check more details here: x.com/_sungmin_cha/s…
XYZ Last Exam becoming a recurring benchmark name was not on my bingo card man.
Progress is going to go right past these in a few years and we'll be onto the next evals before we know it.
XYZ Last Exam becoming a recurring benchmark name was not on my bingo card man.
Progress is going to go right past these in a few years and we'll be onto the next evals before we know it.
🚨 New preprint out! 🚨
Our paper, “Why Alignment Must Precede Distillation: A Minimal Working Explanation” (with @kchonyc ), challenges a common workflow in aligning #LLMs.
#LLM#Alignment#RLHF#DPO
We show why the common #KD → #Align workflow for LLMs fails: low-recall references create a trap that blocks rare but important behaviors.
✅ Solution: #Align → #KD. Align first, then distill.
Better rewards, precision, and stability!😀
We show why the common #KD → #Align workflow for LLMs fails: low-recall references create a trap that blocks rare but important behaviors.
✅ Solution: #Align → #KD. Align first, then distill.
Better rewards, precision, and stability!😀
2K Followers 5K FollowingTeaches and does Stats, ML and AI. Co-Founder and Chief Scientist https://t.co/EygMgQHg07. Former Lecturer at Harvard and Astrophysicist at Penn. Bayesian.
467 Followers 286 FollowingPhD Student @UIUC_NLP. Interested in *semantics of reasoning*, from neuro-symbolic methods to reasoning evaluation/improvement in LLMs. Ex-Intern @MSFTResearch
1K Followers 6K FollowingPediatric pulmonologist and bioinformaticist, now in drug development. He/him. https://t.co/61YX7H46Pc @PioneeringMeds https://t.co/LqrSTRsx0C too
854 Followers 8K FollowingProduct of progressive public policy; raised by public libraries and public education that produced a passion for politics. and apparently alliteration
79K Followers 1K Followingi teach AI on X
leader @openminedorg, research scientist @GoogleDeepMind, ABD PhD @OxfordUni, @UN @GovAI_ @CFR_org GrokkingDL
543K Followers 24K FollowingThe best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
50K Followers 9K FollowingI lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
26K Followers 211 FollowingWorking towards the safe development of AI for the benefit of all @UMontreal, @LawZero_ & @Mila_Quebec
A.M. Turing Award Recipient and most-cited AI researcher.
52K Followers 64 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
57K Followers 859 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
7K Followers 658 FollowingResearch Scientist @AIatMeta
Previously Researcher @ Samsung AI
Outstanding Paper Award @icmlconf 2023
Action Editor @TmlrOrg
I tweet about ML papers and math
97K Followers 8K FollowingCompiling in real-time, the race towards AGI.
The Largest Show on X for AI.
🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
20K Followers 1K Following@OpenAI Language agents (ReAct, Reflexion, Tree of Thoughts, SWE-agent, CoALA) for digital automation (WebShop, SWE-bench, tau-bench)
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
327 Followers 70 FollowingEmbodied lifelong learning (compositionality, RL, TAMP, robotics). Assistant Professor at @SBU_ECE. Postdoc at @MIT_LISLab. PhD from @GRASPlab.
3K Followers 974 FollowingChief Science Officer, Co-Founder @datologyai. Former: Head of Data Research @MosaicML; FAIR. 🧠 and 🤖 intelligence // views are from nowhere
8K Followers 248 FollowingResearch @Meta Superintelligence Labs, I led Llama 2, built post-training from scratch. Also Toolformer, GAIA, Llama-3.0, CodeLlama, Galactica
24K Followers 688 FollowingProfessor and Head of Machine Learning Department at @CarnegieMellon. Board member @OpenAI and @Qualcomm. Chief Technical Advisor @GraySwanAI.
267K Followers 681 FollowingBuilding with AI agents @dair_ai • Prev: Meta AI, Galactica LLM, Elastic, PaperswithCode, PhD • I share insights on how to build with AI Agents ↓
41K Followers 246 FollowingProfessor of Machine Learning, University of Oxford
@OATML_Oxford Group Leader
Director of Research at the UK govt's AI Security Institute (AISI)
38K Followers 564 FollowingAssistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ;
Working on ML, DL, RL, LLMs, and their theory.
No recent Favorites. New Favorites will appear here.