Disagree. As soon as you throw sparsity in (and depthwise/tiny-group conv is a form of sparsity) FLOPs detach from reality. That's why sparse nets are hard (arxiv.org/abs/2006.10901), and EffNetV2 actually UNDOES a lot of depthwise. EffNetV1 == MobileNetV3 == designed for CPU.
Disagree. As soon as you throw sparsity in (and depthwise/tiny-group conv is a form of sparsity) FLOPs detach from reality. That's why sparse nets are hard (arxiv.org/abs/2006.10901), and EffNetV2 actually UNDOES a lot of depthwise. EffNetV1 == MobileNetV3 == designed for CPU.
@giffmana I've faced this too while working on image to image translation. Replaced all convs in the generator with depthwise + pointwise convs. To my surprise the inference time went UP by 30% on the GPU and 120% on the GPU!