Beren Millidge @BerenMillidge, Twitter Profile

Beren Millidge @BerenMillidge

2 weeks ago

Extremely excited to announce Zamba! A 7B SSM with a novel architecture competitive with Gemma-7B and Mistral-7B and significantly beating Llama2-7B trained on only 1T open training tokens.

Quentin Anthony @QuentinAnthon15

2 weeks ago

Extremely excited to announce Zamba! A 7B SSM with a novel architecture competitive with Gemma-7B and Mistral-7B and significantly beating Llama2-7B trained on only 1T open training tokens.

21 86 428 110K 238

Download Image

4 5 51 8K 13

While MoEs trade greater memory cost for reduced FLOPs, GPU memory is the key constraint for many to run models locally. With Zamba, we experimented with going in the other direction -- sharing global attention parameters to boost performance at the same parameter count.

1 0 8 273 1

Download Image

Teortaxes▶️ @teortaxesTex

2 weeks ago

@BerenMillidge Thank you for working in the open, Beren!

0 0 0 136 0

Keiran @drenerbas

2 weeks ago

@BerenMillidge Congratulations!

0 0 0 54 0