❄️Andrew Zhao❄️ @_AndrewZhao

PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Ex. intern@MSFTResearch,@ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On industry job market 2026 andrewzh112.github.io Joined September 2020

Tweets

1K
Followers

4K
Following

3K
Likes

3K

William Fedus @LiamFedus

5 hours ago

Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is…

226 202 2K 924K 599

Download Image

❄️Andrew Zhao❄️ @_AndrewZhao

5 hours ago

iykyk arxiv.org/pdf/2509.24527

1 0 23 508 11

Download Image

Zhongwen Xu @zhongwen2009

a day ago

In case you didn't know my recent work Single-stream Policy Optimizaton (SPO), a group-free low variance policy gradient algorithm. Check this blog out: zhongwenxu.notion.site/Single-stream-… and paper: arxiv.org/abs/2509.13232

Zhongwen Xu @zhongwen2009

2 days ago

5 29 296 31K 269

0 4 23 3K 15

❄️Andrew Zhao❄️ @_AndrewZhao

17 hours ago

paper of the day

9 23 406 37K 106

Download Image

X Freeze @amXFreeze

21 hours ago

Macrohard is coming....

72 30 441 13K 12

Jason Weston @jaseweston

19 hours ago

🌀New work: Era of Real-World Human Interaction 🌀 📝: arxiv.org/abs/2509.25137 - RL *directly* from User Conversations - Organic replies + long-term history are learning signal - Trained on WildChat, beats RLHF at *user* level -> the future for personal Super Intelligence? 🧵1/6

5 41 217 13K 158

Download Image

Zichen Liu @zzlccc

a day ago

After the crazy 极GRPO weekend, let's get rid of the scalar reward or any policy optimization related to it. We explored learning from *verbal feedback* and obtained interesting results:

Tianyu Pang @TianyuPang1

a day ago

After the crazy 极GRPO weekend, let's get rid of the scalar reward or any policy optimization related to it. We explored learning from *verbal feedback* and obtained interesting results:

11 71 383 53K 352

Download Image

2 17 112 12K 67

DeepSeek @deepseek_ai

a day ago

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n

218 871 6K 1.1M 1K

Zhiqing Sun @EdwardSun0909

3 days ago

I don’t often tweet on technical topics but I may have an opposite opinion here…

10 8 386 88K 93

Siddarth Venkatraman @siddarthv66

4 days ago

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

20 95 745 66K 569

Download Image

Yingru Li @RichardYRLi

4 days ago

(1/x) Ever had your #LLM-#RL training mysteriously collapse? 📉 You're not alone. We saw #agentic RL runs fail with exploding #gradients, and found the culprit: a fundamental "training-inference mismatch." Our new #blog post demystifies this vicious cycle.…

9 41 251 22K 232

Download Image

Songming Liu @songming_liu

4 days ago

😠💢😵‍💫Tired of endless data collection & fine-tuning every time you try out VLA? Meet RDT2, the first foundation model that zero-shot deploys on any robot arms with unseen scenes, objects & instructions. No collection. No tuning. Just plug and play🚀 Witness a clear sign of…

23 86 524 78K 404

Download Video

Zhaocheng Zhu @zhu_zhaocheng

6 days ago

One thing people rarely mention about research: ideas set the upper bound of your work, but debugging sets the lower bound. Universities teach us how to chase impactful ideas, but they rarely teach us how to debug large, messy ML systems. Here are a few principles I found useful…

1 1 35 5K 23

xjdr @_xjdr

7 days ago

This is the command most people should run when using codex-cli codex --search --model=gpt-5-codex -c model_reasoning_effort="high" --sandbox workspace-write -c sandbox_workspace_write.network_access=true

40 33 889 75K 1K

🐻熊狸 @bigeagle_xd

a week ago

qwen is crazy, 太卷了

7 4 136 12K 7

Ted Xiao @ CoRL 2025 @xiao_ted

a week ago

The field of robotics is undergoing a historic revolution right now. I’ve spent the last year thinking about how to mentally model the breakneck progress in robotics + AI. With the help of mascots like “The AGI Bro”, we can try to sift through the noise 🧵

2 26 112 13K 68

Download Image

Ulyana Piterbarg @ulyanapiterbarg

a week ago

Proud to have been part of the team behind Gaia2 and ARE! ARE = a gym/platform for scaling up LLM agent envs for evals & RL Gaia2 = a new benchmark for hard & practical agent tasks (search, execution, ambiguity, time, noise, & multi-agent) tinyurl.com/aregaia2

9 46 275 72K 172

Download Image

Romain Froger @froger_romain

a week ago

Most agent benchmarks assume static, perfect worlds. But real life is asynchronous, noisy, and ambiguous. 🌍 🚀 Meet Gaia2 + ARE: a new benchmark and open-source platform for creating environments and evaluating AI agents in (more) realistic environments.