Lucas Beyer (bl16) @giffmana, Twitter Profile

Lucas Beyer (bl16) @giffmana

2 years ago

1/3 All these methods look the same to you? That's the point of this paper! Simply adding losses works equally well as any fancy multi-task method, if one tunes the baseline properly. This matches my experience, and fits my philosophy: tune the simplest possible method -> win.

yobibyte @y0b1byte

2 years ago

7 54 286 0 63

Download Image

5 26 152 0 45

Lucas Beyer (bl16) @giffmana

2 years ago

2/3 I've tried fancy multi-task methods almost every year, but they never outperformed my well-tuned "just add the losses". I never thought much of it, but this paper actually explores both theoretically and empirically why that is!

2 0 13 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

3/3 The TL;DR is that all these fancy methods really just add regularization in an indirect way. Properly regularizing (weight decay/dropout) the simple baseline always achieves the same win ; it turns out (surprise, surprise) the papers never really tuned their baseline well.

1 0 23 0 2

Shikun Liu @liu_shikun

2 years ago

@giffmana I will prove you wrong. Stay tuned. :)

1 0 1 0 0