1/3 All these methods look the same to you? That's the point of this paper! Simply adding losses works equally well as any fancy multi-task method, if one tunes the baseline properly. This matches my experience, and fits my philosophy: tune the simplest possible method -> win.
1/3 All these methods look the same to you? That's the point of this paper! Simply adding losses works equally well as any fancy multi-task method, if one tunes the baseline properly. This matches my experience, and fits my philosophy: tune the simplest possible method -> win.
2/3 I've tried fancy multi-task methods almost every year, but they never outperformed my well-tuned "just add the losses". I never thought much of it, but this paper actually explores both theoretically and empirically why that is!
@giffmana I think it's true in other fields of deep learning as well, such as optimizers, augmentations and more. it's very hard to improve strong simple baselines, and often newer tricks fail to present improvement when we do a fair well-tuned comparison.
@giffmana After the function matching paper and having trying it out myself, I am fully convinced about your philosophy.