Alex Wettig @_awettig
PhD @Princeton trying to make sense of language models and their training data; trying to train agents @cursor_ai cs.princeton.edu/~awettig/ Joined July 2022-
Tweets186
-
Followers2K
-
Following584
-
Likes2K
yolo run summer is over scaling laws fall has arrived
🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge? In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results: * 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia…
MoE layers can be really slow. When training our coding models @cursor_ai, they ate up 27–53% of training time. So we completely rebuilt it at the kernel level and transitioned to MXFP8. The result: 3.5x faster MoE layer and 1.5x end-to-end training speedup. We believe our…
Presenting two posters at ICML over the next two days: - Both at 11am - 1:30pm - Both about how to improve pre-training with domains - Both at stall # E-2600 in East Exhibition Hall A-B (!) Tomorrow: WebOrganizer w/ @soldni & @kylelostat Thursday: MeCo by @gaotianyu1350
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better. https://t.co/SsZloRQR24
Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
New paper cutting through the thicket of KV cache eviction methods!
Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇
Claude Sonnet 4 is much better at codebase understanding. Paired with recent improvements in Cursor, it's SOTA on large codebases
Massive gains with Sonnet 4 on SWE-agent: Single-attempt pass@1 rises to 69% on SWE-bench Verified! Sonnet 4 iterates longer (making it slightly more expensive) but almost never gets stuck. Localization ability appears unchanged, but quality of edits improves.
Great results from the Claude team- the 80% result is pass@1!! They ran the model in parallel multiple times and had an LM judge pick the best patch to submit.
Big arrow time! We can make huge progress on open-source SWE agents by scaling up the creation of virtual coding environments 🚀
Big arrow time! We can make huge progress on open-source SWE agents by scaling up the creation of virtual coding environments 🚀
Cursor is now free for students. Enjoy!
Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10
@ weekend warriors - DM me a GitHub repo that you like / maintain, and I'll train you a 7B coding agent that's an expert for that repo. Main constraints - it's predominantly Python, and has a testing suite w/ good coverage. (example of good repo = sympy, pandas, sqlfluff)
Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
Want state-of-the-art data curation, data poisoning & more? Just do gradient descent! w/ @andrew_ilyas Ben Chen @axel_s_feldmann @wsmoses @aleks_madry: we show how to optimize final model loss wrt any continuous variable. Key idea: Metagradients (grads through model training)
Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]
I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

Vadim Liventsev @vadimdotme
38 Followers 750 Following lead hip hop engineer @ https://t.co/y80r2iuMsN
mhmd hani @mhmdoutofkarak
11 Followers 415 Following i'm not that accomplished yet, to have a bio i mean.
Vicente @Vicente2002_01
8 Followers 559 Following
HaMeedo ReFa'i @hameedorefai
8 Followers 186 Following
kevinz000 @kevinz00025511
0 Followers 57 Following
Urban Intent @urbanintent
169 Followers 1K Following Fix zoning code. Build mixed-use. End absolute car dependency. End sprawl. Create walkable streets. Then watch human connection thrive! #newurbanism | Active!
ʟɪsᴀ @lisacheng
8K Followers 4K Following 10+ yrs in crypto. Blockchain Architect @ AI Co. Ethereum & Mastercoin alum. 2 exits. Burned, rebuilt, still here.
Eamomarc @Eamomarc748
39 Followers 1K Following
mycontext_ai @mycontext_ai
3 Followers 36 Following
Dima Sabanin @DmitrySabanin
527 Followers 665 Following CTO at Elara. Making programs that speak human for the benefit of humanity with some of my favorite people. Code archeologist. Complete nerd. A family man.
無 @xwuxwux
1 Followers 5K Following
Cian Vance @vance_cian
71 Followers 2K Following Equipping sales teams with the tools to succeed. | 40+ sales playbooks created.
RuggedRonin @rugged_ronin
117 Followers 1K Following Survivor of rugs. I buy dips, and build things. Self Proclaimed Solana Maxi
Shivam Singh @er_shivamsingh0
726 Followers 6K Following Engineer| koinophobic | 22 | AI | GPU POOR | Building Neo clouds https://t.co/qGAknj71kz
Jonathan Hayase @JonathanHayase
219 Followers 143 Following 5th year Machine Learning PhD student at UW CSE
Saber Darabi @SADarabi
314 Followers 7K Following
getCream.AI @getcreamai
87 Followers 616 Following
LogDog @InvestorPgh
305 Followers 2K Following
Lichang Chen @LichangChen2
772 Followers 660 Following Context Engineer & Agents | ex GenAI & Science Unit Intern @GoogleDeepmind | PhD’25 @umdcs and BS @ZJU_China
Stuart Sul @stuart_sul
1K Followers 123 Following ml research @cursor_ai, cs @Stanford, mlsys @HazyResearch
ChrisUniverse 🗽 @ChrisUniverseB
41K Followers 13K Following Quantum Physicist — AI Research Scientist— Spiritual — Firstly, a visionary. 🌪️Creative Partner: @Freepik • @LumaLabsAI • @Hailuo_AI • @Pika_Labs
Coupons @coupons24601
36 Followers 525 Following
Domains For Sale @Webs_ForSale
2K Followers 1K Following #Web #Forsale #Domain #DomainsForSale #PremiumDomain #Brand #Startup #Biz #ForSale #BuyNow #MakeOffer 👉 DM If You Need Any.
Yifeng Ding @YifengDing_
830 Followers 2K Following CS PhD student @illinoisCDS. Research intern at AWS AI Labs @AmazonScience. Towards building advanced code LLMs with better reasoning and planning.
Joshua @JoshuaWorth
373 Followers 984 Following Founder @IntentMesh ⚡ @PullSheetapp • Narrative Hacker • MCPMaps3D. Designing intent driven tools that move industries forward.
Gerald Tomson @TomsonGera
15 Followers 649 Following
baozheliu @baozheliu
22 Followers 568 Following
Tachi @tachisarc
15K Followers 433 Following
David Evans @davidevans__
30 Followers 779 Following
shubham yadav @theshubham3777
183 Followers 551 Following
metehan @nothimhuman
9 Followers 97 Following 0x9π Founder of Hardwey Music Group Founder of SCAR REC
Buba demba @bubademba39
30 Followers 2K Following
Halil İbrahim Kutmur @HalilKutmur07
14 Followers 522 Following Artificial Intelligence and Machine Learning Student
Valentina Tardelli @ValentinaT32922
90 Followers 6K Following
Khalad Albadi | خل�... @khaladalbadi
19K Followers 3K Following Fintech |SaaS| Al |PropTech| Drones | Cybersecurity | loT I Data | HealthTech |Robots| Marketplaces |Blockchain| Gaming |Web3.0| Semiconductors | VC and M&A 📊
Fangcong Yin @fangcong_y10593
272 Followers 665 Following CS PhD Student @UTAustin studying NLP. Prev: @CornellCIS
一路孙 Luyi Sun @luyisun_
7 Followers 305 Following Working smarter with AI — exploring productivity, health, e-com & marketing. IT Master’s #MonashUni
sharpalt @sharpalt1
21 Followers 187 Following
Gabor P @GaborP479291
3 Followers 103 Following
Charles 🎉 Frye @charles_irl
14K Followers 3K Following gpu enjoyer at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCazZ3
Jonathan Hayase @JonathanHayase
219 Followers 143 Following 5th year Machine Learning PhD student at UW CSE
Kimura Hinami @hinamin012
35K Followers 521 Following 神戸→神奈川🇯🇵 理学療法士から写真家へ INFJ🌿 雑誌「GENIC4月号,7月号」掲載 ,企業タイアップなど 仕事依頼はMailからお願いします✉️ フリーランスになりました。出張いけます◎
Lichang Chen @LichangChen2
772 Followers 660 Following Context Engineer & Agents | ex GenAI & Science Unit Intern @GoogleDeepmind | PhD’25 @umdcs and BS @ZJU_China
Stuart Sul @stuart_sul
1K Followers 123 Following ml research @cursor_ai, cs @Stanford, mlsys @HazyResearch
Omar Shaikh @oshaikh13
1K Followers 839 Following member of sociotechnical staff @Stanford - previously @GeorgiaTech
Scott Swingle @bio_bootloader
11K Followers 3K Following Father of 3, building Mentat (the github native coding agent!) @AbanteAI, prev @DeepMind
Yifeng Ding @YifengDing_
830 Followers 2K Following CS PhD student @illinoisCDS. Research intern at AWS AI Labs @AmazonScience. Towards building advanced code LLMs with better reasoning and planning.
Jack Cai @jackcai1206
141 Followers 443 Following CS PhD student at Princeton PLI, Research Intern at Microsoft. Previously Masters at UW-Madison. Working towards goal generating long horizon agents.
Sholto Douglas @_sholtodouglas
25K Followers 1K Following Scaling RL @AnthropicAI, ex @DeepMind - working towards intelligence too cheap to meter
Gabriele Berton @gabriberton
6K Followers 1K Following Postdoc @Amazon working on VLM - ex @CarnegieMellon @PoliTOnews @IITalk
Helen Jin @helenj1n
80 Followers 324 Following PhD Student @CIS_Penn @Penn | Intern @awscloud | Trustworthy ML + NLP 🌟 | Previously @CC_Columbia @Columbia
Lindia Tjuatja @lltjuatja
1K Followers 615 Following a natural language processor and “sensible linguist”. PhD-ing @LTIatCMU, previously BS-ing @UT_linguistics + @utexasece 🤠🤖📖 she/her
Jifan Zhang @jifan_zhang
374 Followers 459 Following Research Fellow @AnthropicAI | Previously Ph.D. @WisconsinCS @WIDiscovery, BS/MS @uwcse, @Meta @Google @Amazon
Kylie Robison @kyliebytes
46K Followers 2K Following Senior correspondent covering AI @WIRED • Subscribe to my newsletter https://t.co/jxLAFHz8UP • Robison (rah-beh-son) not Robinson • Send tips on Signal @ kylie.01
Jacob Buckman @jacobmbuckman
5K Followers 375 Following Founder @manifest__ai. PhD candidate @MILAMontreal. Formerly @jhuclsp, @GoogleAI, @SCSatCMU.
Amirhossein Kazemneja... @a_kazemnejad
2K Followers 584 Following Working on RL training of LLMs @Mila_Quebec. Prev: @mcgillu
Hoyeon Chang @hoyeon_chang
919 Followers 2K Following PhD student at @kaist_ai Language & Knowledge Lab Passionate about understanding intelligent systems Also a jazz pianist
Dylan Sam @dylanjsam
802 Followers 465 Following phd student @mldcmu | past: student researcher @GoogleAI, intern @AmazonScience, undergrad @BrownUniversity
Vincent Abbott @vtabbott_
7K Followers 335 Following Maker of *those* diagrams for deep learning algorithms | @mit @mitlids incoming PhD
shira @shiraeis
14K Followers 2K Following ai startup. prev: ai @uchicago @mit @intel @cdcgov and a few other places. I personally think I’m quite funny.
Psyho @FakePsyho
25K Followers 366 Following Game Designer; Problem Solver; past: OpenAI (Dota), Pro Competitive Programmer, Poker
Savvy is 🎃 𓅰 @savvyinwndrland
168 Followers 1K Following mech interp enthusiast, cs+math @harvard '22 "Live simply and do serious things." ~DCH
Ahmad Beirami @abeirami
10K Followers 4K Following sth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
Yoonsang Lee @yoonsang_
228 Followers 609 Following CS PhD @princeton_nlp @princetonPLI; prev @SeoulNatlUni
Teknium (e/λ) @Teknium1
50K Followers 5K Following Cofounder and Head of Post Training @NousResearch, prev @StabilityAI Github: https://t.co/LZwHTUFwPq HuggingFace: https://t.co/sN2FFU8PVE
Calvin French-Owen @calvinfo
15K Followers 492 Following Making things, trail running. Prev: Codex @OpenAI, https://t.co/4qWGncHOAX, co-founder @Segment, @MIT
Nick Miller @nickwm
2K Followers 1K Following 25+ years building software products. Now building @cursor_ai. DMs open.
Peter Zakin @pzakin
5K Followers 5K Following Partner @upfrontvc. Sold Hyper Travel to @tradeshift. ex @venmo, @usemacro, YC S2010. Latest post: https://t.co/pBTXxEURWz
Dan Biderman @dan_biderman
2K Followers 1K Following systems that are nervous postdoc @Stanford w/ @HazyResearch & @scott_linderman. prev: neuro phd @cu_neurotheory, post training @DbrxMosaicAI
Simon Guo @simonguozirui
3K Followers 5K Following CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia
Alex Graveley @alexgraveley
38K Followers 1K Following Co-creator of GitHub Copilot, Dropbox Paper, AI Tinkerers, Hackpad, MobileCoin, Minion AI, etc. Working on @PerplexityComet. Survivor 🎗️
Rylan Schaeffer @RylanSchaeffer
6K Followers 2K Following CS PhD Student at Stanford Trustworthy AI Research with @sanmikoyejo. Prev interned/worked @ Meta, Google, MIT, Harvard, Uber, UCL, UC Davis
Xuheng Li @xuhengli_
950 Followers 2K Following CS PhD candidate @UCLA, supervised by @QuanquanGu | RL, deep learning theory, diffusion model | Previously BSc @PKU1898 | Stargazer
Oliver Li @oliveraochongli
243 Followers 1K Following phd in language models at @Cornell_CS; prev @columbianlp
Rapael Kalandadze @RaphaelKalan
330 Followers 986 Following CTO & Co-Founder of https://t.co/YbSs6udjfb • LLM • SLM • Agentic Systems • Built Georgia’s first LLM https://t.co/avPOYB7VuG • Synthetic data
Edoardo Ponti @PontiEdoardo
3K Followers 476 Following Assistant Professor in #NLP at @EdinburghUni | A Kleene star shines on the hour of our meeting
Sharon Goldman @sharongoldman
7K Followers 3K Following Reporter on the AI beat, @FortuneMagazine [email protected]