I've been told timm has a lot of hidden features. Yes, the docs need improving, that's a WIP! Curious about one of those features I've been using a lot lately in CLIP ViT fine-tuning? Every model in timm, when used with optimizer factory supports layer-wise LR decay.
Also known as discriminative LR decay, this applies a decaying LR to the model params as you move away from the head. It's very useful for fine-tuning from large pretrain dataset (or semi/unsupervised train -> supervised) without blowing away properties from pretrain.
@wightmanr Out of curiosity, which script are you using to fine-tune models like CLIP ViT? There's a JAX script to train CLIP in huggingface/examples/research_projects but it does not seem to rely on timm...
@wightmanr I doubt I’m skilled enough but it is hacktober if you have some baby doc issues that need fixed.
@wightmanr I would really love a collection of building blocks with @wightmanr quality. We were discussing this with @benjamin_warner today, high quality building blocks: Attention layers, ResBlocks, Upsample, that are torchscript-ables, and as fast as possible. You have everything already.