Dear #LazyTwitter, are there any papers comparing encoder-only and decoder-only models (of similar sizes) in sentence classification and document ranking tasks? Many thanks, tried to find such a study but failed completely.
@srchvrs We are just now revising our text classification survey to include more decoder-only models. Most LLM papers don't bother evaluate on plain-simple topic classification. TLDR; SOTA in text clf is still encoder-only, despite size diffs survey preprint arxiv.org/abs/2204.03954
@srchvrs Two exceptions: 1) Most promising prompting technique seems to be CARP, together with RoBERTa to assemble the few-shots and voting scheme, they get an edge over BERT. arxiv.org/abs/2305.08377
@srchvrs 2) Pushing the Limits. An ensemble of fine-tuned Llamas gives a solid boost (2-3 points on all datasets) over RoBERTa. arxiv.org/abs/2402.07470
@LukasGalke Nice thank you very much for all the references. Your survey seems to be very relevant. Regarding, LLAMA, well, yes, it's 13B and Roberta is 100M or 300M at most.
@srchvrs @LukasGalke Quite crazy how hypes go. BTW that is also true of you stick a classification head over the encoder of Something new (T5-PILE or UL or something?)
@srchvrs @LukasGalke And also for the next survey (thanks already shared this survey with a few people) x.com/ElronBandel/st…
@srchvrs @LukasGalke And also for the next survey (thanks already shared this survey with a few people) x.com/ElronBandel/st…