Just Loki @LokiJulianus
Prince With a Thousand Enemies Nod Joined April 2016-
Tweets68K
-
Followers66K
-
Following311
-
Likes51K
The $15 “silver” chain race.
Oh this LLLMM? It’s just predicting the next language model
> "I think it's time to admit defeat" How often do you see LLMs capitulate instead of doubling down or gaslighting you? Sadly 8B Llama is struggling with The Diamond Problem (as do all <10B models that don't cheat egregiously), but its attitude sure is more human-like now.
> "I think it's time to admit defeat" How often do you see LLMs capitulate instead of doubling down or gaslighting you? Sadly 8B Llama is struggling with The Diamond Problem (as do all <10B models that don't cheat egregiously), but its attitude sure is more human-like now. https://t.co/825CTekSv5
In fairness to the lisp machine people: most higher level languages cannot easily implement a full common lisp interpreter even today and not for lack of trying.
In fairness to the lisp machine people: most higher level languages cannot easily implement a full common lisp interpreter even today and not for lack of trying.
He’s change his instagram name to Zuck lol.
And just like that.
Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
Llama 3 70B never stopped learning. He says the only reason they stopped its training was that they eventually had to decide: 'Do we want to spend our GPUs on training the 70B model further?' or should we start training what's next?
Llama 3 70B never stopped learning. He says the only reason they stopped its training was that they eventually had to decide: 'Do we want to spend our GPUs on training the 70B model further?' or should we start training what's next? https://t.co/NohXjF2TaH
Mistral-7B is dead.
Llama 3 has been my focus since joining the Llama team last summer. Together, we've been tackling challenges across pre-training and human data, pre-training scaling, long context, post-training, and evaluations. It's been a rigorous yet thrilling journey: 🔹Our largest models…
Lads, are we finally free of the debt inherited by OAi's decision to not open source gpt-3...
Lads, are we finally free of the debt inherited by OAi's decision to not open source gpt-3...
llama-3-70B is as good or better than sonnet but ~10x cheaper, about as cheap as Haiku. Llama has just demolished everything below gpt-4 level
Hearing feedback from the community about the adverse impacts of false refusals, we developed new mitigations to address this. Llama 3 70B exhibits less than a third of the false refusals of Llama 2 70B, making Llama 3 our most helpful model to date.
I wish Twitter had a way to send all the people calling me antisemitic for talking about Palestine to the same place as the people calling me a genocide supporter for saying that blocking traffic is a stupid tactic, so they could argue with one another instead of me.
Congrats to @AIatMeta on Llama 3 release!! 🎉 ai.meta.com/blog/meta-llam… Notes: Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ @lmsysorg :)) 400B is still training, but already encroaching…
Llama 3 delivers a major leap over Llama 2 and demonstrates SOTA performance on a wide range of industry benchmarks. The models also achieve substantially reduced false refusal rates, improved alignment and increased diversity in model responses — in addition to improved…