Ilya Kostrikov @ikostrikov, Twitter Profile

Ilya Kostrikov @ikostrikov

3 years ago

Excited to present our work with @ashvinair and @svlevine, Offline RL with Implicit Q-Learning (IQL), a simple method that achieves SOTA performance on D4RL arxiv.org/abs/2110.06169 and works 4x faster than prior SOTA github.com/ikostrikov/imp… Thread below

6 21 119 0 26

Download Image

Ilya Kostrikov @ikostrikov

3 years ago

Actor-Critic algorithms can fail for offline RL when the actor outputs out-of-dataset actions for TD backups. What if we just do TD learning with the dataset actions? That is very stable, but it learns the behavior policy value function while we want the optimal value function.

1 0 5 0 0

Ankesh Anand @ankesh_anand

3 years ago

@ikostrikov @ashvinair @svlevine Congrats, looks really simple to implement and effective! Any plans to benchmark it on RLUnplugged? Would love to see a comparison to offline MuZero (arxiv.org/abs/2104.06294)

1 0 3 0 2

Jean Dessain 🇪🇺🇺🇦 @JDessain

3 years ago

@ikostrikov @svlevine @ashvinair Looks really smart! I will for sure look at your repository and try to replicate the approach in my field.

0 0 1 0 0