I feel it's worth reimplementing methods by yourself to better understand the domain you're interested in While reimplementing >20 different knowledge distillation methods for torchdistill, I found many positive/negative aspects in recent KD studies
I feel it's worth reimplementing methods by yourself to better understand the domain you're interested in While reimplementing >20 different knowledge distillation methods for torchdistill, I found many positive/negative aspects in recent KD studies
@yoshitomo_cs Yes as a learning experience absolutely. But always pick (at least) one number from the paper and try to get as close as possible. Otherwise you may not learn much. I actually originally spent ~1mo doing this for BN when it first came out.
@giffmana I agree. As the process is very time-consuming, I would like to suggest people pick (at least) one target number for well-benchmarked configs if available. It also helps them compare the reproduced results with those in other studies e.g., I picked T: R34 S: R18 for ImageNet
@yoshitomo_cs for doing ML, one really needs patience in all aspects of it =)
@giffmana Maybe I'm demanding, but want to say patience and spirit of reproducibility to make our lives easier :)