Lucas Beyer (bl16) @giffmana, Twitter Profile

Lucas Beyer (bl16) @giffmana

2 years ago

@amy_tabb @theshawwn I read your linked post and agree with everything, but it does not contradict my statement? 1. If you do DL, you have min 1 GPU already 2. R50-i1k took me 2w on 1 GPU 5 ago, now much better. 3. It is _the_ gold standard, but a paper can't spend 2 gpu-weeks or 5$ cloud on it? huh?

3 0 4 0 0

Cody Blakeney @code_star

2 years ago

@giffmana @amy_tabb @theshawwn Before this specific demo it used to be like $30 cloud and only if you knew the exact hyper parameters you wanted and didn’t have to train a baseline to compare yada, yada. It’s also not trivial to reserve 8 A100s on google cloud (or even 1) … you need approval for the resources

1 0 4 0 0

Cody Blakeney @code_star

2 years ago

@giffmana @amy_tabb @theshawwn I’m not saying you shouldn’t include imagenet results, just explaining what can be hard about doing it on a budget.

0 0 4 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@amy_tabb @code_star @theshawwn (I'm not saying it's easy) rebuttal to all points: 1. most papers take pride in using the baseline's (r50) exact hparams for theirs, or highlight their robustness -> no own baseline training and no big hparam search needed. (ablations on smaller scale is OK)

1 0 1 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@amy_tabb @code_star @theshawwn 2. cloud is the 5-30$ 20min option for the impatient when it works. When it doesn't, there's still the 5yo single GPU, just have 2w of patience (worst case here, single 5yo GPU, I don't believe that's current standard!) One can prepare the dreaded coursework while it's training:)

1 0 2 0 0

Cody Blakeney @code_star

2 years ago

@giffmana @amy_tabb @theshawwn I can think of a very painful rebuttal for a paper in my first year of my PhD using a gtx 1060 in my personal computer and my wife’s and ran pruning experiments on cifar10/100 in two models at 6 sparsities with 3 methods. Replicating it on imagenet would have taken months 🙃

1 0 4 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@code_star @amy_tabb @theshawwn That sounds more like an ablation or hparam search, note that I mention "ablations on smaller scale is OK". If you claimed the hparams to be "mostly robust" then you should be able to do just one run on ImageNet. If not, then you may want to try finding "early indicator" metrics.

1 0 1 0 0

Cody Blakeney @code_star

2 years ago

@giffmana @amy_tabb @theshawwn Yeah it was the core experiment actually. I was doing SVCCA analysis to see how the model representations of layers changed as pruning was applied. This particular experiment was unusual I admit.

0 0 1 0 0