Matthieu Lin @capybara_ai

PhD@TsinghuaCS Interested in Reinforcement Learning, LLM-based Agents, Alignment. linyuhongg.github.io Joined December 2023

Tweets

192
Followers

80
Following

1K
Likes

520

Nan Jiang @nanjiang_cs

2 days ago

My 3rd blogpost on PG, the topic I am least familiar with but get asked a lot, so I thought I'd just put together the very limited stuff I know on this topic. Somehow the post gets cynical from time to time🙃 nanjiang.cs.illinois.edu/2025/09/29/pg.…

1 21 135 16K 89

Download Image

Andrew Zhao @_AndrewZhao

3 days ago

paper of the day

13 33 529 55K 129

Download Image

Aleksa Gordić (水平问题) @gordic_aleksa

4 days ago

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute…

46 378 2K 253K 3K

Download Image

Feng Yao @fengyao1909

a week ago

🆕 𝐔𝐩𝐝𝐚𝐭𝐞!! A few additional findings for 𝐑𝐨𝐥𝐥𝐨𝐮𝐭–𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐢𝐬𝐦𝐚𝐭𝐜𝐡: ① 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐬𝐦 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 is a huge driver of the gap, with Sequence Parallelism (SP) causing most high max mismatch. ② 𝐋𝐨𝐧𝐠𝐞𝐫 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐬…

Feng Yao @fengyao1909

2 months ago

13 119 733 148K 661

Download Image

2 29 176 17K 104

Download Image

Jerry Tworek @MillionInt

a week ago

Science of RL optimization is likely humanity’s last open scientific problem

42 63 1K 119K 284

Elon Musk @elonmusk

2 weeks ago

Wood-fired pizza oven coming to @xai

Eric Jiang @veggie_eric

2 weeks ago

Wood-fired pizza oven coming to @xai

1K 171 5K 11.8M 266

3K 4K 41K 12.4M 1K

Elon Musk @elonmusk

a week ago

Help build Macrohard, the AI software company!

Arshdeep Singh @arsh99_singh

a week ago

Help build Macrohard, the AI software company!

469 258 4K 24.6M 588

5K 8K 65K 23.9M 4K

gabriel @GabrielPeterss4

2 weeks ago

keep a doc with all health related events and symptoms you have ever had for every new health concern, give it to chatgpt 5 pro along with your context, and it will find the craziest patterns and links between things, and a list of actions

20 9 402 24K 230

Sadhika Malladi @SadhikaMalladi

2 weeks ago

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here: cs.princeton.edu/~smalladi/recr…

37 68 513 75K 136

Thinking Machines @thinkymachines

3 weeks ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

241 1K 8K 3.3M 5K

Download Image

Andrew Zhao @_AndrewZhao

4 weeks ago

New banger dropped for task space exploration, from the OG himself arxiv.org/pdf/2509.04575

2 41 269 19K 254

Download Image

Andrew Zhao @_AndrewZhao

4 weeks ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

Lifan Yuan @lifan__yuan

4 weeks ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

13 88 425 42K 357

Download Image

0 4 33 3K 13

Yiran Wu @YiranWu18

4 weeks ago

Introducing 🛡️ExCyTIn‑Bench: Evaluating LLM agents on Cyber Threat Investigations. It’s built on Azure tenant, a real Security Operations Center environment, covering 57 tables. Explore how LLMs fare in realistic, multi-hop incident detection! #Cybersecurity #AI #LLM #Benchmark

10 19 76 8K 43

Download Image

Kunhao Zheng @KunhaoZ

a month ago

People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴. What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬? ✨This is the reward. Prsented in the analytic form. Next step? Pass it to GRPO and witness the magic.

Kunhao Zheng @KunhaoZ

5 months ago

12 135 833 132K 733

Download Image

3 3 29 2K 16

Download Image

Jacob Austin @jacobaustin132

2 months ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38 530 4K 388K 5K

Download Image

Jack Morris @jxmnop

2 months ago

getting a phd is mostly about developing a system for maximizing the time you spend in flow state, over several years

44 105 2K 92K 643

Download Image

Yi Wu @jxwuyi

2 months ago

🔍We introduce ASearcher, a search agent trained by end2end RL Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search (+20.8/+46.7% on GAIA/xBench) 💻Data, Code&Model: github.com/inclusionAI/AS… 📄Paper: arxiv.org/abs/2508.07976v #Agent #OpenSource #LLM #AGI

2 56 289 26K 155

Download Image

Jack Morris @jxmnop

2 months ago

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

158 467 6K 923K 4K

Download Image

Wen-Tse Chen @WenzeChen2

2 months ago

[0/3] 🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. 📊 Max episode length comparison: •VeRL / RAGEN → ~10 turns •verl-agent → ~50 turns •Verlog (ours) → 400+ turns 🔥 ⚙️ Technical foundation:…

2 71 398 35K 362

Download Gif

Feng Yao @fengyao1909

2 months ago

⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance. We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔! 📝 Blog:…