1/3 All these methods look the same to you? That's the point of this paper! Simply adding losses works equally well as any fancy multi-task method, if one tunes the baseline properly. This matches my experience, and fits my philosophy: tune the simplest possible method -> win.
1/3 All these methods look the same to you? That's the point of this paper! Simply adding losses works equally well as any fancy multi-task method, if one tunes the baseline properly. This matches my experience, and fits my philosophy: tune the simplest possible method -> win.
2/3 I've tried fancy multi-task methods almost every year, but they never outperformed my well-tuned "just add the losses". I never thought much of it, but this paper actually explores both theoretically and empirically why that is!
3/3 The TL;DR is that all these fancy methods really just add regularization in an indirect way. Properly regularizing (weight decay/dropout) the simple baseline always achieves the same win ; it turns out (surprise, surprise) the papers never really tuned their baseline well.