i saw another tweet about someone beating a benchmark with rl after training directly on it and open sourcing the project. what's literally the point? are startups just showing off or have we forgotten basic train/val/test splits ever since rl went mainstream?
0
0
4
272
1