🤗 Hub & Transformers now support ᴢᴇʀᴏ-ꜱʜᴏᴛ ɪᴍᴀɢᴇ ᴄʟᴀꜱꜱɪꜰɪᴄᴀᴛɪᴏɴ In this example, I'm using ʟᴀɪᴏɴ/ᴄʟɪᴘ-ᴠɪᴛ-ʟ-14-ʟᴀɪᴏɴ2ʙ-ꜱ32ʙ-ʙ82ᴋ, which is "the best performing open-source ViT CLIP model released" (more info below) huggingface.co/laion/CLIP-ViT…
1
14
69
0
22
Download Image
ᴢᴇʀᴏ-ꜱʜᴏᴛ ɪᴍᴀɢᴇ ᴄʟᴀꜱꜱɪꜰɪᴄᴀᴛɪᴏɴ means that you are not constrained to a specified list of classes (such as imagenet 1000 classes). Moreover, it does not necessarily have to be single word classes, classes can be phrases, or even sentences.