Scott Emmons @emmons_scott, Twitter Profile

Scott Emmons @emmons_scott

2 years ago

What is essential for policy learning from fixed/precollected data? We find that supervised learning with just a depth-two MLP is competitive with SoTA algorithms. No TD learning, advantage reweighting, or Transformers! arxiv.org/abs/2112.10751 github.com/scottemmons/rvs

5 33 196 0 27

Download Video

Trenton Bricken @TrentonBricken

2 years ago

@emmons_scott Great work Scott! Did you ever try having the agent learn to produce its desired goal state conditioned on the reward?

1 0 2 0 0

Misha Laskin @MishaLaskin

2 years ago

@emmons_scott This is awesome, very cool work Scott! Are any of the envs partially observable? In prev work we found that transformers with K=1 was sufficient for many D4RL tasks but larger contexts were needed for Atari. I.e. for D4RL MLPs are prob enough but that may not be generally true.

1 0 2 0 0

성영기 @RXaTcY7rccSI10q

2 years ago

@emmons_scott @berkeley_ai

0 0 1 0 0

Download Video

FollowML @followML_

2 years ago

@emmons_scott

0 0 1 0 0

Download Image