@amy_tabb @theshawwn I read your linked post and agree with everything, but it does not contradict my statement? 1. If you do DL, you have min 1 GPU already 2. R50-i1k took me 2w on 1 GPU 5 ago, now much better. 3. It is _the_ gold standard, but a paper can't spend 2 gpu-weeks or 5$ cloud on it? huh?
@giffmana @amy_tabb @theshawwn Before this specific demo it used to be like $30 cloud and only if you knew the exact hyper parameters you wanted and didn’t have to train a baseline to compare yada, yada. It’s also not trivial to reserve 8 A100s on google cloud (or even 1) … you need approval for the resources
@giffmana @amy_tabb @theshawwn I’m not saying you shouldn’t include imagenet results, just explaining what can be hard about doing it on a budget.
@amy_tabb @code_star @theshawwn (I'm not saying it's easy) rebuttal to all points: 1. most papers take pride in using the baseline's (r50) exact hparams for theirs, or highlight their robustness -> no own baseline training and no big hparam search needed. (ablations on smaller scale is OK)
@amy_tabb @code_star @theshawwn 2. cloud is the 5-30$ 20min option for the impatient when it works. When it doesn't, there's still the 5yo single GPU, just have 2w of patience (worst case here, single 5yo GPU, I don't believe that's current standard!) One can prepare the dreaded coursework while it's training:)
@giffmana @amy_tabb @theshawwn I can think of a very painful rebuttal for a paper in my first year of my PhD using a gtx 1060 in my personal computer and my wife’s and ran pruning experiments on cifar10/100 in two models at 6 sparsities with 3 methods. Replicating it on imagenet would have taken months 🙃
@code_star @amy_tabb @theshawwn That sounds more like an ablation or hparam search, note that I mention "ablations on smaller scale is OK". If you claimed the hparams to be "mostly robust" then you should be able to do just one run on ImageNet. If not, then you may want to try finding "early indicator" metrics.
@giffmana @amy_tabb @theshawwn Yeah it was the core experiment actually. I was doing SVCCA analysis to see how the model representations of layers changed as pruning was applied. This particular experiment was unusual I admit.