Linh Le @linhlpv

PhD student at A2I2 Reinforcement Learning, Adaptation and Generalization linhlpv.github.io Joined December 2015

Tweets

212
Followers

72
Following

468
Likes

2K

Sergey Levine @svlevine

6 days ago

A new VLA for navigation that can take in goal images, positions, and language, and exhibits some pretty neat emergent language following!

noriaki_hirose @NoriakiHirose

6 days ago

A new VLA for navigation that can take in goal images, positions, and language, and exhibits some pretty neat emergent language following!

4 57 296 61K 160

Download Video

6 45 367 39K 158

Is scale all you need? Or is there still a role for incorporating domain knowledge and inductive bias? While I was in Heidelberg, I took some time to write a short essay on this question called "The Bittersweet Lesson". theoryandpractice.org/2025/09/The%20… #HLF25

2 9 107 11K 58

Download Image

Richard Sutton @RichardSSutton

a week ago

For those really into it, here are another 50 minutes of my views on planning and action selection in options-based AI agents (like in the Oak architecture). youtube.com/watch?v=eJSoV2…

13 60 494 74K 484

Aran Komatsuzaki @arankomatsuzaki

a week ago

DiffusionNFT: RL for diffusion models via the forward process • Contrastive fine-tuning: positives vs negatives → implicit policy improvement • Works with any solver, no CFG, no trajectory storage • 25× more efficient than FlowGRPO • Boosts SD3.5-M: GenEval 0.24 → 0.98 in…

2 42 278 21K 195

Download Image

Siddharth Ancha @siddancha

a week ago

Excited to announce that Streaming Flow Policy is accepted to CoRL’25 as an Oral presentation! 🎉 #CoRL2025 We just released a self-contained Jupyter notebook that trains and tests SFP in the Push-T environment: siddancha.github.io/streaming-flow… Looking forward to presenting this work…

Siddharth Ancha @siddancha

4 months ago

1 22 132 35K 103

Download Gif

7 24 242 24K 140

Nan Jiang @nanjiang_cs

a week ago

I was surprised by how many didnt know that (1) per token MLE is whole seq MLE, and (2) PG at token level same as PG at seq level (optimizkng one big combinatorial action). story is different if you introduce fitted critic/Q-values or intermediate resets.

Nando de Freitas @NandoDF

a week ago

21 62 711 93K 729

Download Image

9 37 360 106K 439

Michal Balcerak @X_MichalB

a week ago

Generative Modeling: What's After Flow Matching? Flow Matching lacks explicit modeling of scores on the data manifold. Introducing Energy Matching [NeurIPS 2025] unlocking exciting new inference-time capabilities! Paper: arxiv.org/pdf/2504.10612 Code: github.com/m1balcerak/Ene…

3 107 754 36K 559

Download Image

Michal Nauman @mic_nau

3 weeks ago

Check out our new work on scaling RL via iterative computation. We apply flow-matching to value function learning and it works really well 🔥

Aviral Kumar @aviral_kumar2

3 weeks ago

Check out our new work on scaling RL via iterative computation. We apply flow-matching to value function learning and it works really well 🔥

9 79 674 40K 497

Download Image

0 10 24 2K 6

Jyo Pari @jyo_pari

4 weeks ago

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

18 143 890 140K 771

Download Image

Sergey Levine @svlevine

3 weeks ago

The pi-05 model is now in openpi: github.com/Physical-Intel… Now with pytorch (πtorch?) support too!

12 90 795 78K 298

Preston Fu @preston_fu

3 weeks ago

With the right design decisions, value-based RL admits predictable scaling. value-scaling.github.io We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

4 57 445 37K 438

Download Image

Nishanth Kumar @nishanthkumar23

3 weeks ago

World models hold a lot of promise for robotics, but they're data hungry and often struggle with long horizons. We learn models from a few (< 10) human demos that enable a robot to plan in completely novel scenes! Our key idea is to model *symbols* not pixels 👇

19 81 500 78K 377

Download Video

Richard Sutton @RichardSSutton

a month ago

I was happy to give a more technical talk on how we might create an AI at RLC-2025 and AGI-2025 (video below). The Oak Architecture: A Vision of Super-Intelligence from Experience As AI has become a huge industry, to an extent it has lost its way. What is needed to get us back on…

21 101 676 59K 508

Gabriele Berton @gabriberton

a month ago

Finding an ML summer school has never been easier Here is a GitHub repo with a comprehensive list, with 50+ ML summer (and winter) schools all over the world (link in comments) Some of them are even free, few even offer scholarship so you don't have to pay absolutely anything

2 36 267 24K 364

Download Image

Alexandre Brown 🇨🇦 @AlexandreBrown0

2 months ago

🚀 I'm excited to share our new paper: SegDAC: Segmentation-Driven Actor-Critic for Visual Reinforcement Learning 🧠 SegDAC combines large vision models with online RL to reason about its environment at the object and sub-object level, avoiding noisy pixel-level reasoning. 🛠️…

5 17 79 10K 56

Download Video

Ilir Aliu - eu/acc @IlirAliu_

2 months ago

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot…

16 110 610 72K 234

Download Video

Joseph Suarez 🐡 @jsuarez5341

2 months ago

At what point does perf optimization get ridiculous. During my PhD, everything was 500-5000 sps. Then I got 10k and was very proud. Then 100k in early versions of PufferLib. Then 1M in 2.0... and now we're at up to 6M productive SPS on some RL envs

6 8 224 12K 70

Download Image

Perry Dong @perryadong

3 months ago

Fine-tuning pre-trained robotic models with online RL requires a way to train RL with expressive policies Can we design an effective method for this? We propose EXPO, a sample-efficient online RL algorithm that enables stable fine-tuning of expressive policy classes (1/6)

1 10 59 40K 46

Cansu Sancaktar @CcansuSancaktar

3 months ago

✨Introducing SENSEI✨ We bring semantically meaningful exploration to model-based RL using VLMs. With intrinsic rewards for novel yet useful behaviors, SENSEI showcases strong exploration in MiniHack, Pokémon Red & Robodesk. Accepted at ICML 2025🎉 Joint work with @cgumbsch 🧵

2 36 150 12K 68

Download Gif

Nan Jiang @nanjiang_cs

2 months ago

missing ICML, and I used this week to write my first technical blog on some recent thoughts on two different roles of simulators in RL and the confusions/misconceptions around them. Comments welcome! nanjiang.cs.illinois.edu/2025/07/16/sim…