"linear layers" are affine and all the deep learning literature should be corrected.
@theshawwn @francoisfleuret I'm instantly put off when a library or framework considers the non linearity to be part (say an option) of another layer
@giffmana @francoisfleuret Maybe, but we need a term in between layer and block. A block usually consists of conv -> norm -> relu, repeated three or four times. I'd like a term for that "conv -> norm -> relu" part. I think of that as a "layer", conceptually. But I know that's not quite right.
@theshawwn @francoisfleuret That's the thing, having one somehow "freezes" this exact combination or at least conv-normalizer-activation and disincentives trying other stuff. At least that's what I noticed happens.
@giffmana @francoisfleuret If the term was general enough, it wouldn't necessarily freeze that combo. On the other hand, it's hard to think of a different combo. What else have you used?