I've been plugging away on some ViT fine-tuning experiments this week while slowly recovering from a nasty cold. Part of the LAION-2B + CLIP exploration, I am fine-tuning the recently released weights to ImageNet. Some expectations were upended, but interesting weights inbound.
For first runs I fine-tuned directly from the image tower CLIP weights (loadable in timm now via the HF hub models). This went okay, but I was hoping for more. I squeezed 87.4 @ 224 L/14, 87.8 @ 336, 82.2 @ 224 B/32, 84.4 @ 384 B/32. H/14 @ 224 only 87.6. Not bad, but not wow.
@wightmanr I'm always amazed by the number of experiments you conduct all year round, it's really inspiring🙂 I was wondering: what is your current hardware situation nowadays? Still running training on cloud servers or do you have one at home? If so, what are its specs? 🤓