"linear layers" are affine and all the deep learning literature should be corrected.
@theshawwn @francoisfleuret I'm instantly put off when a library or framework considers the non linearity to be part (say an option) of another layer
@giffmana @theshawwn @francoisfleuret Do you want it to be part of the same layer or completely separate, as in not a part of any layer?
@giffmana @francoisfleuret Maybe, but we need a term in between layer and block. A block usually consists of conv -> norm -> relu, repeated three or four times. I'd like a term for that "conv -> norm -> relu" part. I think of that as a "layer", conceptually. But I know that's not quite right.
@giffmana @theshawwn @francoisfleuret I think it’s sometimes done since cudnn provides some particular fusions of conv + activation functions. So if you don’t have a compiler that can do it, it can help for performance.