21
8
498
43K
41
Download Image
@yacineMTB Bro, Yotsuba B? Really? I have to rethink our relationship.
@yacineMTB me thinking about getting H100 to run it.
@yacineMTB 405b dense is kinda crazy because gpt 4 is meant to be 200b parameters “per head” mixture model. this will perform extremely well - all inference optimisations follow immediately by the community
@yacineMTB why would you want MoE over dense for inference?