What is essential for policy learning from fixed/precollected data? We find that supervised learning with just a depth-two MLP is competitive with SoTA algorithms. No TD learning, advantage reweighting, or Transformers! arxiv.org/abs/2112.10751 github.com/scottemmons/rvs
5
33
196
0
27
Download Video
@emmons_scott Great work Scott! Did you ever try having the agent learn to produce its desired goal state conditioned on the reward?
@emmons_scott This is awesome, very cool work Scott! Are any of the envs partially observable? In prev work we found that transformers with K=1 was sufficient for many D4RL tasks but larger contexts were needed for Atari. I.e. for D4RL MLPs are prob enough but that may not be generally true.