"linear layers" are affine and all the deep learning literature should be corrected.
@francoisfleuret Just pretend b is part of W and X has a 1
@francoisfleuret Yes but it doesn’t roll as nicely of the tongue. I also just call it a weighted sum, people usually know what is meant :P
@francoisfleuret Linear versus Affine is a distinction @JustinDomke semi-regularly brought up in his ML class and I really appreciated it
@francoisfleuret The same issue comes up when teaching or discussing calculus. I really want to say “the key idea of calculus is to take a nonlinear function (difficult) and approximate it locally by a linear function (easy)”. But technically, we’re approximating by an affine function.
@francoisfleuret Just wait until you hear what the softargmax is commonly called… 🥴🥴🥴
@francoisfleuret In most deep learning literature a linear layer comprises of an affine transformation followed by a non linear activation function. So a linear layer is more than just an affine transformation