Lucas Beyer (bl16) @giffmana, Twitter Profile

Lucas Beyer (bl16) @giffmana

2 years ago

Disagree. As soon as you throw sparsity in (and depthwise/tiny-group conv is a form of sparsity) FLOPs detach from reality. That's why sparse nets are hard (arxiv.org/abs/2006.10901), and EffNetV2 actually UNDOES a lot of depthwise. EffNetV1 == MobileNetV3 == designed for CPU.

Horace He @cHHillee

2 years ago

3 9 65 0 17

Download Image

27 26 308 0 43

Dan Zhang @DZhang50

2 years ago

@giffmana This is true due to limitations in existing accelerator hardware. Our work, FAST, analyzes EfficientNet bottlenecks and shows a framework capable of automatically designing custom accelerators with 4x Perf/TDP on EfficientNet-B7 relative to TPU-v3. arxiv.org/abs/2105.12842

2 1 7 0 1

Lucas Beyer (bl16) @giffmana

2 years ago

@DZhang50 I had not seen this paper yet, interesting approach. Not sure yet if I like or dislike this direction, as it risks locking us into a specific arch for a long time, although the approach itself certainly seems flexible.

1 0 0 0 0

Dan Zhang @DZhang50

2 years ago

@giffmana Thanks for the feedback! Under this approach, I think the goal would be to still have general-purpose ML accelerators, but also build a few optimized accelerators for specific popular workloads, eg EfficientNet and Transformers.

0 0 0 0 0