Dan Fu @realDanFu, Twitter Profile

Dan Fu @realDanFu

a year ago

Attention is all you need... but how much of it do you need? Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao 📜 arxiv.org/abs/2212.14052 1/n

31 283 2K 372K 730

Dan Fu @realDanFu

a year ago

One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone! We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters! github.com/HazyResearch/H3 2/n

3 10 131 15K 22