Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.
Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.
(1/5) Super excited to release our new paper on Reinforcement Learning:
"Self-Aligned Reward: Towards Effective and Efficient Reasoners"!
Preprint: arxiv.org/pdf/2509.05489
🤝 Can LLM agents really understand us?
We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands.
📄 arxiv.org/pdf/2507.22034
💻 github.com/SalesforceAIRe…
(1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B > 671B: Our 8B…
Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.
📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6
🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder.
💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits?
We introduce 👓Ego-R1: A framework…
Can LLMs make rational decisions like human experts?
📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker
We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making…
(1/5) Want to make your LLM a skilled persuader?
Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"!
For details:
📄Arxiv: arxiv.org/pdf/2505.22961
🛠️GitHub: github.com/ulab-uiuc/ToMAP
📢 New Paper Drop: From Solving to Modeling!
LLMs can solve math problems — but can they model the real world? 🌍
📄 arXiv: arxiv.org/pdf/2505.15068
💻 Code: github.com/qiancheng0/Mod…
Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
How to improve the test-time scalability?
- Separate thinking & solution phases to control performance under budget constraint
- Budget-Constrained Rollout + GRPO
- Outperforms baselines on math/code.
- Cuts token 30% usage without hurting performance
huggingface.co/papers/2505.05…
🚀 Can we cast reward modeling as a reasoning task?
📖 Introducing our new paper:
RM-R1: Reward Modeling as Reasoning
📑 Paper: arxiv.org/pdf/2505.02387
💻 Code: github.com/RM-R1-UIUC/RM-…
Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we…
We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs.
– Achieves 2–4× faster convergence than RAFT
– Improves accuracy on math…
Thrilled to announce that our paper Sparse VideoGen got into #ICML2025! 🎉
Our new approach to speedup Video Generation by 2×. Details in the thread/paper.
Huge thanks to my collaborators!
Blog: svg-project.github.io
Paper: arxiv.org/abs/2502.01776
Code:…
Thrilled to announce that our paper Sparse VideoGen got into #ICML2025! 🎉
Our new approach to speedup Video Generation by 2×. Details in the thread/paper.
Huge thanks to my collaborators!
Blog: svg-project.github.io
Paper: arxiv.org/abs/2502.01776
Code:…
Thrilled to share my first project at NVIDIA! ✨
Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated…
5K Followers 2K FollowingResearch Scientist at @Meta Fundamental AI Research (FAIR), New York. Previously: Postdoc @Caltech, PhD @PrincetonCS, Undergrad @Tsinghua_Uni.
784 Followers 4K FollowingPostdoc at Edinburgh, Ph.D. at CUHK. Former Visiting Scholar at Cornell. Working on reinforcement learning and multi-armed bandits.
633 Followers 1K FollowingProfessor at Texas A&M University; ML/AI researcher; optimization for ML/AI; large reasoning models, developing LibAUC library for training deep neural nets.
67K Followers 13 FollowingDevelop profitable trading strategies, build a systematic trading process, and trade your ideas with Python—even if you’ve never done it before.
332 Followers 227 FollowingWorking on LLM/VLM Tool Learning and Reasoning at Tsinghua and Bytedance, reading at least one paper a day — The future will not invent itself.
2K Followers 32 FollowingPrinceton University initiative enhancing fundamental understanding of AI, enabling its use in academic disciplines, and examining AI's societal implications.
882K Followers 52 Followingwe invest in software eating the world
https://t.co/A9eTFq6plZ
https://t.co/MXGUBJoesw
Watch "The Ben & Marc Show": https://t.co/eRuDhx7kpe
52K Followers 64 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
117 Followers 85 FollowingWe are a group of researchers working on natural language processing in the Department of Computer Science at the University of Hong Kong.
3K Followers 440 FollowingIncoming Assistant Professor at Harvard and Kempner Institute. Postdoc at UC Berkeley. Former Ph.D. student at Cornell Tech. https://t.co/LyIdb5HmM9
6K Followers 152 FollowingSkywork Super Agents: the Originator of Al Workspace Agents, turns your 8 hours of work into 8 minutes. Support: https://t.co/Zvze6mFI6E
13K Followers 693 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.