Leo Boytsov @srchvrs, Twitter Profile

Leo Boytsov @srchvrs

a month ago

Dear #LazyTwitter, are there any papers comparing encoder-only and decoder-only models (of similar sizes) in sentence classification and document ranking tasks? Many thanks, tried to find such a study but failed completely.

7 1 11 4K 10

Lukas Galke @LukasGalke

a month ago

@srchvrs We are just now revising our text classification survey to include more decoder-only models. Most LLM papers don't bother evaluate on plain-simple topic classification. TLDR; SOTA in text clf is still encoder-only, despite size diffs survey preprint arxiv.org/abs/2204.03954

1 0 3 145 0

Lukas Galke @LukasGalke

a month ago

@srchvrs Two exceptions: 1) Most promising prompting technique seems to be CARP, together with RoBERTa to assemble the few-shots and voting scheme, they get an edge over BERT. arxiv.org/abs/2305.08377

1 0 1 59 0

Lukas Galke @LukasGalke

a month ago

@srchvrs 2) Pushing the Limits. An ensemble of fine-tuned Llamas gives a solid boost (2-3 points on all datasets) over RoBERTa. arxiv.org/abs/2402.07470

1 0 1 61 0

Leo Boytsov @srchvrs

a month ago

@LukasGalke Nice thank you very much for all the references. Your survey seems to be very relevant. Regarding, LLAMA, well, yes, it's 13B and Roberta is 100M or 300M at most.

1 0 0 52 0

Leshem Choshen 🤖🤗 @LChoshen

2 weeks ago

@srchvrs @LukasGalke Quite crazy how hypes go. BTW that is also true of you stick a classification head over the encoder of Something new (T5-PILE or UL or something?)

1 0 1 164 0

Leshem Choshen 🤖🤗 @LChoshen

2 weeks ago

@srchvrs @LukasGalke And also for the next survey (thanks already shared this survey with a few people) x.com/ElronBandel/st…

Elron Bandel @ElronBandel

2 weeks ago

@srchvrs @LukasGalke And also for the next survey (thanks already shared this survey with a few people) x.com/ElronBandel/st…

10 58 273 50K 298

Download Image

1 0 4 280 0

Lukas Galke @LukasGalke

2 weeks ago

@LChoshen @srchvrs Thanks for the pointer and for sharing the survey :) I guess sticking a classification head over a T5 encoder would yield similar scores as the fine-tuned encoder-only models. Current best is DeBERTa but only marginal improvements over BERT.

1 0 0 90 0