L @CodeTitanium

Joined July 2023

Tweets

1K
Followers

103
Following

5K
Likes

62K

Thinking Machines @thinkymachines

13 hours ago

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

56 356 2K 542K 2K

Download Image

wh @nrehiew_

13 hours ago

Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive) - Big Terminal Bench gains - Similar to GPT5, it has huge gains on TauBench - Telecom specifically. - SWEBench is starting to plateau

Claude @claudeai

13 hours ago

910 3K 17K 3.0M 3K

Download Image

5 7 248 37K 23

Download Image

Claude @claudeai

13 hours ago

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

910 3K 17K 3.0M 3K

Download Image

🇺🇦 Dzmitry Bahdanau @DBahdanau

15 hours ago

I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!

DeepSeek @deepseek_ai

20 hours ago

I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!

196 788 6K 904K 913

0 3 121 12K 32

DeepSeek @deepseek_ai

20 hours ago

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n

196 788 6K 904K 913

Simo Ryu @cloneofsimo

a week ago

Only incels yap about how ai is unproductive, while sigma males only benefit from it.

Lisan al Gaib @scaling01

a week ago

Only incels yap about how ai is unproductive, while sigma males only benefit from it. https://t.co/hoPLtUZAUY

8 24 464 268K 118

Download Image

8 49 1K 81K 430

Download Image

Mark Carney @MarkJCarney

a week ago

Today, Canada recognises the State of Palestine.

29K 45K 277K 29.7M 7K

Download Image

Chubby♨️ @kimmonismus

2 weeks ago

I did not see that coming: Chain of Thought in AI Video generation! LumaAI introduced Ray 3 - HDR Generation - Fidelity "state of the art realism, physics, character consistency, and far superior instruction following" - SOTA Physics & Controls And much more. Some examples:

Luma AI @LumaLabsAI

2 weeks ago

12 23 262 115K 136

Download Video

38 82 853 68K 325

Download Video

Qwen @Alibaba_Qwen

3 weeks ago

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…

172 723 4K 905K 2K

Download Image

main @main_horse

3 weeks ago

I hope to receive pushback on today's claim.

13 19 335 127K 146

Download Image

main @main_horse

4 weeks ago

a modest proposal

5 13 138 17K 48

Download Image

Keller Jordan @kellerjordan0

4 weeks ago

Great to see this effort towards rigorous hyperparameter tuning. Two areas for improvement: 1. IIUC, the scaled up run here isn't actually tuned at all - its hparams are set via extrapolation 2. Sensitive hparams need a more granular sweep than power-of-2 x.com/percyliang/sta…

Percy Liang @percyliang

4 weeks ago

16 80 688 176K 371

4 10 175 19K 68

Thibaut Boissin @ThibautBoissin

a month ago

Following up on my Newton–Schulz speedup post, here’s the code: github.com/thib-s/flash-n… (I'll do a PR soon in Dion/Muon) And here’s how I squeezed out the extra gain ⬇️

Thibaut Boissin @ThibautBoissin

a month ago

Following up on my Newton–Schulz speedup post, here’s the code: github.com/thib-s/flash-n… (I'll do a PR soon in Dion/Muon) And here’s how I squeezed out the extra gain ⬇️

6 36 478 32K 202

Download Image

1 7 49 8K 45

main @main_horse

4 weeks ago

μtransfer for Mamba2 & Muon

4 23 196 18K 122

Download Image

Jessy Lin @realJessyLin

a month ago

🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge? In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results: * 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia…

15 157 1K 129K 1K

Download Image

Thibaut Boissin @ThibautBoissin

a month ago

Good news: I managed to get an extra 1.6x speedup of the Newton Schulz algorithm (which is at the core of Dion/Muon). It reaches nearly a 3x speedup over the plain torch implementation !

6 36 478 32K 202

Download Image

Lisan al Gaib @scaling01

a month ago

Google made a huge breakthrough in image editing with their new Gemini 2.5 Flash model 171 elo points ahead of the 2nd best model

15 55 944 56K 69

Download Image

elie @eliebakouch

a month ago

Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale! > It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training. > They use WSD with a "Simple moving…

16 41 363 113K 309

Download Image

Simo Ryu @cloneofsimo

a month ago

So we went from "LLM is memorizing dataset" to "LLM is not reasoning" to "LLM cannot do long / complex math proving" to "Math that LLM is doing is not REAL math. LLM can't do REAL math" Where do we go from now?

Edward Frenkel @edfrenkel

a month ago

252 235 2K 1.0M 601

154 83 1K 232K 215

leloy! @leloykun

a month ago

I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, @jxbz, and @Jianlin_S on Muon, Orthogonal Muon, & Stiefel Muon. --- The general solution turned out to be much simpler than I thought. And it should…