Oleg Rybkin @_oleh

RL at scale at xAI olehrybkin.com Philadelphia Joined January 2014

Tweets

314
Followers

994
Following

428
Likes

1K

Preston Fu @preston_fu

3 weeks ago

With the right design decisions, value-based RL admits predictable scaling. value-scaling.github.io We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

4 57 445 37K 437

Download Image

Aviral Kumar @aviral_kumar2

3 weeks ago

@preston_fu @_oleh and I wrote a blog post on scaling laws and value function based RL, summarizing our two papers in this direction and discussing open questions! value-scaling.github.io Check it out! Feedback & comments are very welcome!

0 3 32 955 10

Download Image

Aviral Kumar @aviral_kumar2

4 weeks ago

We have been doing work on scaling laws for off-policy RL for some time now and we just put a new paper out: arxiv.org/abs/2508.14881 Here, @preston_fu @_oleh lead a study on how to best allocate compute for training value functions in deep RL: 🧵⬇️

2 25 161 7K 95

Download Video

Sergey Levine @svlevine

4 weeks ago

Following up on our work on scaling laws for value-based RL (led by @_oleh and @preston_fu), we've been trying to figure out compute optimal parameters for value-based RL training. Check out Preston's post about our findings!

Preston Fu @preston_fu

4 weeks ago

7 29 181 42K 106

Download Video

3 19 191 18K 86

Paul Zhou @zhiyuan_zhou_

4 weeks ago

How can we best scale up value based RL? We need to use bigger models, which mitigate what we call “TD-overfitting” (more below!👇 🧵 ). Further, we need to scale batch size and UTD accordingly as the models get bigger. Great work led by @preston_fu and @_oleh

Preston Fu @preston_fu

4 weeks ago

7 29 181 42K 106

Download Video

1 1 12 722 1

Oleg Rybkin @_oleh

4 weeks ago

📈📈📈

Preston Fu @preston_fu

4 weeks ago

📈📈📈

7 29 181 42K 106

Download Video

0 0 9 832 0

Oleg Rybkin @_oleh

2 months ago

Cool work by David and friends! Could this be the thing that finally makes everyone stop using Gaussians as their policies? 🤔

David McAllister @davidrmcall

2 months ago

Cool work by David and friends! Could this be the thing that finally makes everyone stop using Gaussians as their policies? 🤔

8 201 1K 135K 933

Download Video

2 0 20 1K 5

Qiyang Li @qiyang_li

3 months ago

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

3 72 365 39K 296

Download Video

Oleg Rybkin @_oleh

4 months ago

Very insightful analysis that I mostly agree with (except the overly pessimistic title :)!

Seohong Park @seohong_park

4 months ago

Very insightful analysis that I mostly agree with (except the overly pessimistic title :)!

37 192 1K 164K 1K

Download Image

3 4 23 5K 10

Oleg Rybkin @_oleh

4 months ago

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. @seohong_park shows that horizon is the critical issue.

Seohong Park @seohong_park

4 months ago

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. @seohong_park shows that horizon is the critical issue.

10 147 921 137K 751

Download Video

0 3 17 2K 2

Seohong Park @seohong_park

4 months ago

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

5 43 343 34K 301

Download Image

Paul Zhou @zhiyuan_zhou_

5 months ago

This was fun thanks for having me @chris_j_paxton @micoolcho! See the podcast for some livestream of the robot in real time and me evaluating a policy live! Or check it out for yourself at auto-eval.github.io and eval your policy in real without breaking a sweat