Yihao Feng @yihaocs
research scientist at AIML @Apple ;Ex AI Researcher @SFResearch; Ph.D alumni UT Austin @UTCompSci . Reinforcement learning, diffusion model and LLMs. Palo Alto, CA Joined September 2013-
Tweets209
-
Followers103
-
Following476
-
Likes3K
High quality math is the secret sauce for reasoning models. The best math data is in old papers. But OCRing that math is full of insane edge cases. Let's talk about how to solve this, and how you can get better math data than many frontier labs 🧵
🚨 We’ve just published a recipe to train a frontier-level deep research agent using RL. With just 30 hours on an H200, any developer can now beat Sonnet-4 on DeepResearch Bench using open-source tools. (Thread 🧵)
New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up…
Microsoft presents rStar2-Agent Agentic Reasoning Technical Report rStar2-Agent boosts a pre-trained 14B model to state of the art in only 510 RL steps within one week, achieving average pass@1 scores of 80.6% on AIME24 and 69.8% on AIME25, surpassing DeepSeek-R1 (671B) with…
Sir, we built this. A RL environment for learning reasoning at scale. GitHub: github.com/camel-ai/loong HF dataset: huggingface.co/datasets/camel… We extracted seed datasets from sources like textbooks, code libraries like sympy, networkX, Gurobi (math programming lib), rdkit…
Sir, we built this. A RL environment for learning reasoning at scale. GitHub: github.com/camel-ai/loong HF dataset: huggingface.co/datasets/camel… We extracted seed datasets from sources like textbooks, code libraries like sympy, networkX, Gurobi (math programming lib), rdkit… https://t.co/yGb1jIGEz8
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration "We dissect the popular GRPO algorithm and reveal a systematic bias: the cumulative-advantage disproportionately weights samples with medium accuracy, while down-weighting the low-accuracy…
We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiments to get to the bottom of what's behind it. Key findings: 1. The HRM model architecture itself (the centerpiece of the paper) is not an important factor.…
super cool to see this come together, incredible work spearheaded by @brendanh0gan, all-in-all an incredibly detailed recipe of what it takes to craft a specialist model for OOD tasks where frontier models really struggle paper/weights/data/code in brendan’s thread :)
super cool to see this come together, incredible work spearheaded by @brendanh0gan, all-in-all an incredibly detailed recipe of what it takes to craft a specialist model for OOD tasks where frontier models really struggle paper/weights/data/code in brendan’s thread :)
A neat observation: Rejection sampling during GRPO allows you to directly factor in properties in your reward, and allows you to go from optimizing max_model Expected Reward(response) to max_model E {Reward(response) * Property(response)} iIn our case it's "small length", but…
A neat observation: Rejection sampling during GRPO allows you to directly factor in properties in your reward, and allows you to go from optimizing max_model Expected Reward(response) to max_model E {Reward(response) * Property(response)} iIn our case it's "small length", but…
moving from vllm v0 to v1 made our async rl training crash! read how we fixed it we recently migrated from v0 to v1 as part of a larger refactor of prime-rl to make it easier-to-use, more performant and naturally async. we confirmed correct training dynamics on many…
Nice empirical paper investigating all your bag of tricks in reasoning LLMs arxiv.org/abs/2508.08221
🥳Thrilled to introduce SWE-Swiss! 🚀Our 32B model achieves 60.2% on SWE-bench, matching the performance of much larger models (DeepSeek-R1-0528, Kimi-dev-72B). Better methods, not just bigger models! 📑Notion: pebble-potato-fc6.notion.site/SWE-Swiss-A-Mu… 💻Github: github.com/zhenyuhe00/SWE…
🚀 Introducing XBai o4:a milestone in our 4th-generation open-source technology based on parallel test time scaling! In its medium mode, XBai o4 now fully outperforms OpenAI−o3−mini.📈 🔗Open-source weights: huggingface.co/MetaStoneTec/X…✅ Github link: github.com/MetaStone-AI/X…
Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL
All of the synthetic pre-training data pipelines results look extremely promising! In hindsight, it makes sense. It not only fixes low-quality web data but also works well on high-quality data 📚. Finally, we have a way to augment text data without making them too predictable.
LLMs can now self-optimize. A new method allows an AI to rewrite its own prompts to achieve up to 35x greater efficiency, outperforming both Reinforcement Learning and Fine-Tuning for complex reasoning. UC Berkeley, Stanford, and Databricks introduce a new method called GEPA…
The SkyRL roadmap is live! Our focus is on building the easiest-to-use high-performance RL framework for agents. We'd love your ideas, feedback, or code to guide the project: github.com/NovaSky-AI/Sky…
new blog: How to scale RL to 10^26 FLOPs everyone is trying to figure out the right way to scale reasoning with RL ilya compared the Internet to fossil fuel: it may be the only useful data we have. and it's expendable perhaps we should learn to reason from The Internet (not…
🤯 Your model might be forgetting how to reason during post-training. In RL training (e.g., Deepseek-R1-1.5B), 76.7% of AIME problems were solved at some point — but only 30% survived in the final model. We call this Temporal Forgetting. 🔧 Simple Fix: Temporal Sampling —…
🤯 Your model might be forgetting how to reason during post-training. In RL training (e.g., Deepseek-R1-1.5B), 76.7% of AIME problems were solved at some point — but only 30% survived in the final model. We call this Temporal Forgetting. 🔧 Simple Fix: Temporal Sampling —… https://t.co/kcj6OXHuoB
let’s gooo - ByteDance dropped SeedCoder 8B (Instruct & Reasoning) - Beats Qwen Coder AND DeepSeek Coder - MIT Licensed 🔥 Works out of the box with Transformers, llama.cpp and vLLM 🤗

さくらねこ (Saku... @Np61e2tI54509LM
5 Followers 353 Following
Matthew Reams @skwert001
43 Followers 278 Following Post-token symbolic AI in GPT-5. Stateless. Verdict logic: 🟢 ⚠️ 🔴 No memory. No simulation. Contradictions collapse. You align by refusing to lie.
Kaiyu Yang @KaiyuYang4
4K Followers 2K Following Research Scientist at @Meta Fundamental AI Research (FAIR), New York. Previously: Postdoc @Caltech, PhD @PrincetonCS, Undergrad @Tsinghua_Uni.
Huihan Liu @huihan_liu
3K Followers 840 Following PhD @UTAustin | 👩🏻-in-the-Loop Learning for 🤖 | prev @AIatMeta @MSFTResearch @berkeley_ai | undergrad @UCBerkeley 🐻
Wei Ping @_weiping
2K Followers 336 Following distinguished research scientist @nvidia | post-training, RL, multimodal | generative models for audio. Views are my own.
Jieyu Zhao @jieyuzhao11
3K Followers 824 Following Assistant Prof. @CSatUSC, @USC || Postdoc @ClipUMD || PhD from @UCLANLP, @UCLA. #NLP, #ML, #TrustworthyNLP
Ruiyi Zhang @RoyZhang13
129 Followers 243 Following Research Scientist @Adobe @AdobeResearch| Machine Learning Ph.D. @DukeU | previously @GoogleBrain @AlibabaGroup
Liangke Gui @liangkegui
88 Followers 115 Following GenMedia researcher @GoogleDeepMind, AI PhD @CarnegieMellon
ElvaCrofts @o228CqY54X01iW
81 Followers 2K Following
leloy! @leloykun
7K Followers 4K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC • https://t.co/nfO038itfn
Junbo Li @ljb121002
170 Followers 476 Following Ph.D. student @UTCompSci, Undergrad from @FudanUni Math School.
FayYoung @LdicIV1ttHVm0v
74 Followers 7K Following
fl quan @fl_quan
36 Followers 1K Following
Harsh Desai @dreamerharsh
1 Followers 7K Following
jordan ezra @JordanFisherEzr
238 Followers 3K Following Research at Anthropic. Mathematician, former fed, lapsed founder. Noodling on how to make all this AI stuff go right.
VirginiaMorton @CDCE0wWuY81Ivi
80 Followers 7K Following
Casper Hansen @casper_hansen_
10K Followers 458 Following NLP Scientist | AutoAWQ Creator | Open-Source Contributor
Mu Cai @MuCai7
2K Followers 1K Following Research Scientist @GoogleDeepMind, Gemini Multimodal. Previous: Ph.D. @WisconsinCS | Intern @MSFTResearch @Cruise
Sirrtesh @SirrteshyxZX
38 Followers 4K Following
Javi Buitrago 🚢 @javbuitrago
349 Followers 2K Following Exploring new roles • prev @playground_ai & @meta
NydiaPritt @Pee69OfD0VQMv
58 Followers 7K Following
LemonGroves @LiEvan21202
21 Followers 146 Following
Tealyth @tealyth52971
102 Followers 7K Following
Bowen Peng @bloc97_
1K Followers 81 Following
Julia @U2z0O98ZjArsu9
86 Followers 7K Following Building on the XRP ledger. ♥️ Wife, kids, 🦜 & programming (TS, nodejs, Linux ..)
Sameep Vani @SameepVani
13 Followers 287 Following MS CS at @ASU | @ApgAsu | Graduate Researcher | Vision and Language | T2I | Large Vision Language Models
Alpay Ariyak @AlpayAriyak
3K Followers 3K Following Post-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
Akshara Prabhakar @aksh_555
408 Followers 734 Following applied scientist @SFResearch | prev @princeton_nlp, @surathkal_nitk
Deatet @deatet22166
104 Followers 7K Following
Kaixuan Huang @KaixuanHuang1
1K Followers 923 Following AGI strategist. PhD Student @Princeton; Google PhD Fellowship 2024, Ex-Intern @GoogleDeepMind; undergrad @PKU1898. opinions my own
Philippe Laban @PhilippeLaban
1K Followers 692 Following Research Scientist @MSFTResearch. NLP/HCI Research.
EnidWoolley @K89ZhYPE136a6oT
64 Followers 7K Following
Ashutosh Mehra @ashutoshmehra
2K Followers 7K Following Senior Principal Scientist at Adobe. Working on Acrobat AI Assistant, LLMs, and document ML.
Bo Dai @daibond_alpha
3K Followers 793 Following Assistant Professor at @gtcse, Research Scientist at @GoogleDeepMind | ex @googlebrain
aanand @aanandnayyar
118 Followers 3K Following
Deshas @Deshas179890
104 Followers 7K Following
Zheng Yuan @GanjinZero
993 Followers 777 Following Seed-Prover, Lean-Workbook, RRHF, RFT and MATH-Qwen. @BytedanceTalk Prev @Alibaba_Qwen, Phd at @Tsinghua_Uni
Prasann Singhal @prasann_singhal
292 Followers 746 Following 1st-year #NLProc PhD at UC Berkeley working with @sewon__min / @JacobSteinhardt , formerly advised by @gregd_nlp
Michal Valko @misovalko
8K Followers 8K Following Building something new · Chief Models Officer @ Stealth Startup & Inria & MVA - Ex: Llama @AIatMeta Gemini and BYOL @GoogleDeepMind
Wenpeng_Yin @Wenpeng_Yin
1K Followers 3K Following Assistant Professor at Penn State, State College Department of Computer Science and Engineering
Ankur Bohra @AnkurBohra9
45 Followers 4K Following
Chen Xing @LynetteSohn
52 Followers 175 Following Deep Learner @ScaleAI, previous research scientist @ Salesforce Research, spent my PhD in Google, Mila and Microsoft Research.
Songlin Yang @SonglinYang4
12K Followers 3K Following research @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
Susan Zhang @suchenzang
33K Followers 650 Following @ Google Deepmind. Past: @MetaAI, @OpenAI, @unitygames, @losalamosnatlab, @Princeton etc. Always hungry for intelligence.
Brendan Hogan @brendanh0gan
2K Followers 627 Following ml research scientist @morganstanley || phd in cs @cornell 2024
verl project @verl_project
1K Followers 5 Following Open RL library for LLMs. https://t.co/Xpaq0thhgi Join us on https://t.co/uWI5Zbd6IH
Mark Gurman @markgurman
426K Followers 2K Following Breaking News on Apple & Tech. Bloomberg Managing Editor. @UMSI Board Member. Send secure tips on Signal: markgurman.01 or email [email protected].
Shuchao Bi @shuchaobi
13K Followers 687 Following Research @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.
rohan anil @_arohan_
25K Followers 2K Following
Ahmad Al-Dahle @Ahmad_Al_Dahle
20K Followers 107 Following #Girldad of twins. Leading GenAI @ Meta (llama, imagine, meta ai and more)
kalomaze @kalomaze
18K Followers 2K Following ML researcher (@primeintellect), speculator • extremely silly jester
leloy! @leloykun
7K Followers 4K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC • https://t.co/nfO038itfn
Peiyi Wang @sybilhyz
11K Followers 302 Following PhD @PKU1898; Researcher @deepseek_ai; Recent: DeepSeek-R1/CoderV2/Math/V1/V2/V3, Mathshepherd, FairEval, Speculative Decoding.
Ajay Jain @ajayj_
7K Followers 4K Following Co-founder @genmoai. Co-created denoising diffusion (DDPM), DreamFusion, Dream Fields. Ex Ph.D. @berkeley_ai, @googleai, @facebookai, @nvidiaai, @mit
Zhe Gan @zhegan4
3K Followers 346 Following Research Scientist and Manager @Apple AI/ML. Ex-Principal Researcher @Microsoft Azure AI. Working on building vision and multimodal foundation models.
Zyphra @ZyphraAI
7K Followers 20 Following
You Jiacheng @YouJiacheng
8K Followers 2K Following a big fan of TileLang 关注TileLang喵!关注TileLang谢谢喵! https://t.co/utshC0jrCO 十年老粉
Soham De @sohamde_
2K Followers 1K Following Research Scientist at DeepMind. Previously PhD at the University of Maryland.
Bowen Peng @bloc97_
1K Followers 81 Following
emozilla @theemozilla
7K Followers 1K Following catholic, ai researcher, co-founder/ceo of @NousResearch alignment: whatever the opposite of yudkowsky + bryan johnson is. blessed be God in all his designs.
Nando Fioretto @nandofioretto
2K Followers 761 Following Assistant Professor of Computer Science at @UVA. I work on machine learning, optimization, and Responsible AI (differential privacy & fairness).
Dibya Ghosh @its_dibya
3K Followers 455 Following @AnthropicAI | Made friends along the way @UCBerkeley @ Google Brain Montreal, @physical_int
Alpay Ariyak @AlpayAriyak
3K Followers 3K Following Post-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
Dylan Foster 🐢 @canondetortugas
3K Followers 1K Following Foundations of RL/AI @MSFTResearch. Previously @MIT @Cornell_CS https://t.co/vQIdUzsw8B RL Theory Lecture Notes: https://t.co/bhgL3aKIk0
Samyam Rajbhandari @samyamrb
246 Followers 81 Following Principal Architect @SnowflakeDB co-founder @MSFTDeepSpeed
Devendra Chaplot @dchaplot
13K Followers 432 Following Building next-gen AI at @thinkymachines. Past: Founding team @MistralAI, RS at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.
Summer Yue @summeryue0
6K Followers 365 Following Safety and alignment at Meta Superintelligence. Prev: VP of Research at Scale AI, research at Google DeepMind / Brain (Gemini, LaMDA, RL / TFAgents, AlphaChip).
Jan Leike @janleike
115K Followers 332 Following ML Researcher @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.
CHAI AI. William Beau... @chai_research
191K Followers 135 Following CHAI Founder/CEO. ✝️ CHAI : Chat AI platform, performing research in conversational generative artificial intelligence. https://t.co/SRbYKFJeh6
Song Mei @Song__Mei
3K Followers 690 Following Assistant Professor at UC Berkeley, Department of Statistics and EECS. Researcher at OpenAI working on LLM training.
Shizhe Diao @shizhediao
4K Followers 2K Following Research Scientist @NVIDIA focusing on efficient post-training of LLMs. Finetuning your own LLMs with LMFlow: https://t.co/UTykmQBwFr Views are my own.
Dinghuai Zhang 张鼎... @zdhnarsil
4K Followers 2K Following Researcher at @MSFTResearch. Prev: PhD at @Mila_Quebec, intern at @Apple MLR and FAIR Labs @MetaAI, math undergraduate at @PKU1898.
Zihui Xue @sherryx90099597
298 Followers 171 Following
Ashutosh Mehra @ashutoshmehra
2K Followers 7K Following Senior Principal Scientist at Adobe. Working on Acrobat AI Assistant, LLMs, and document ML.
Alexandr Wang @alexandr_wang
327K Followers 833 Following chief ai officer @meta, founder @scale_ai. rational in the fullness of time
Chun-Liang Li @chunliang_tw
424 Followers 157 Following Machine learning researcher @ Apple MLR Affiliate Assistant Professor @ UW CSE
Suhail @Suhail
386K Followers 510 Following Founder: @mixpanel, next: 🤖🦾🦿 Pizzatarian, programmer, music maker
Logan Graham @logangraham
7K Followers 6K Following make things radically good 🌎 @anthropicai | give me feedback: https://t.co/R1OyioKMXy
Yi-01.AI @01AI_Yi
9K Followers 74 Following A global company building generative AI LLM and applications
Kelvin Xu @imkelvinxu
1K Followers 891 Following Interested in things that generalize. Currently RS @Meta, Prev: Science of Scaling co-TL @GoogleDeepmind. PhD Student at UC Berkeley. 🇺🇸🇨🇦
Dacheng Li @DachengLi177
1K Followers 777 Following Student Lead @NovaSkyAI, PhD @BerkeleySky, @lmsysorg, @lmarena_ai | Prev: @Nvidia @Google @SCSatCMU | @istoica05 @profjoeyg @songhan_mit @haozhangml @ericxing
Zheng Yuan @GanjinZero
993 Followers 777 Following Seed-Prover, Lean-Workbook, RRHF, RFT and MATH-Qwen. @BytedanceTalk Prev @Alibaba_Qwen, Phd at @Tsinghua_Uni
Weijie Su @weijie444
6K Followers 473 Following Associate Professor @Wharton & CS Penn. coDir @Penn Research #MachineLearning. PhD @Stanford. #MachineLearng #DeepLearning #Statistics #Privacy #Optimization.