RL is supposed to be slow and inefficient, right? Turns out that carefully implemented model-free RL can learn to walk from scratch in the real world in under 20 minutes! We took our robot for a "walk in the park" to test out our implementation. sites.google.com/berkeley.edu/w… thread->
Careful implementation of actor-critic methods can train very fast if we set up the task properly. We trained the robot entirely in the real world in both indoor and outdoor locations, each time learning policies in ~20 min.
Here are some examples of training (more videos on the website). Note that the policy on each terrain is different, dealing with dense mulch, soft surfaces, etc. With these training speeds, the robot adapts in real time.
By Laura Smith & @ikostrikov Arxiv: arxiv.org/abs/2208.07860 Website: sites.google.com/berkeley.edu/w… From here, exploring lifelong learning, continual adaptation, and other cool real world RL applications will be really exciting!
BTW, much as I want to say we had some brilliant idea that made this possible, truth is that the key is really just good implementation, so the takeaway is "RL done right works pretty well". Though I am *very* impressed how well @ikostrikov & Laura made this work.
@svlevine @ikostrikov Can you comment on examples of what's kinds of things were implemented better than traditional codebases (e.g. stable_baselines3)?
@dav_ell @ikostrikov We'll get the code released shortly so you can see for yourself :) but there is a bit of discussion in the arxiv paper
@svlevine @dav_ell We've partially released our code here: github.com/ikostrikov/wal… The code for interfacing a real robot is coming soon.