LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…
Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive)
- Big Terminal Bench gains
- Similar to GPT5, it has huge gains on TauBench - Telecom specifically.
- SWEBench is starting to plateau
Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive)
- Big Terminal Bench gains
- Similar to GPT5, it has huge gains on TauBench - Telecom specifically.
- SWEBench is starting to plateau https://t.co/SYEni2w3nf
Introducing Claude Sonnet 4.5—the best coding model in the world.
It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!
I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model!
✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
👉 Now live on App, Web, and API.
💰 API prices cut by 50%+!
1/n
I did not see that coming: Chain of Thought in AI Video generation!
LumaAI introduced Ray 3
- HDR Generation
- Fidelity "state of the art realism, physics, character consistency, and far superior instruction following"
- SOTA Physics & Controls
And much more. Some examples:
I did not see that coming: Chain of Thought in AI Video generation!
LumaAI introduced Ray 3
- HDR Generation
- Fidelity "state of the art realism, physics, character consistency, and far superior instruction following"
- SOTA Physics & Controls
And much more. Some examples: https://t.co/x4WDxhtSu1
🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!
🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…
Great to see this effort towards rigorous hyperparameter tuning. Two areas for improvement:
1. IIUC, the scaled up run here isn't actually tuned at all - its hparams are set via extrapolation
2. Sensitive hparams need a more granular sweep than power-of-2
x.com/percyliang/sta…
Great to see this effort towards rigorous hyperparameter tuning. Two areas for improvement:
1. IIUC, the scaled up run here isn't actually tuned at all - its hparams are set via extrapolation
2. Sensitive hparams need a more granular sweep than power-of-2
x.com/percyliang/sta…
Following up on my Newton–Schulz speedup post, here’s the code: github.com/thib-s/flash-n… (I'll do a PR soon in Dion/Muon)
And here’s how I squeezed out the extra gain ⬇️
Following up on my Newton–Schulz speedup post, here’s the code: github.com/thib-s/flash-n… (I'll do a PR soon in Dion/Muon)
And here’s how I squeezed out the extra gain ⬇️
🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge?
In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results:
* 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia…
Good news: I managed to get an extra 1.6x speedup of the Newton Schulz algorithm (which is at the core of Dion/Muon). It reaches nearly a 3x speedup over the plain torch implementation !
Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale!
> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training.
> They use WSD with a "Simple moving…
So we went from
"LLM is memorizing dataset"
to
"LLM is not reasoning"
to
"LLM cannot do long / complex math proving"
to
"Math that LLM is doing is not REAL math. LLM can't do REAL math"
Where do we go from now?
So we went from
"LLM is memorizing dataset"
to
"LLM is not reasoning"
to
"LLM cannot do long / complex math proving"
to
"Math that LLM is doing is not REAL math. LLM can't do REAL math"
Where do we go from now?
I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, @jxbz, and @Jianlin_S on Muon, Orthogonal Muon, & Stiefel Muon.
---
The general solution turned out to be much simpler than I thought. And it should…
I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, @jxbz, and @Jianlin_S on Muon, Orthogonal Muon, & Stiefel Muon.
---
The general solution turned out to be much simpler than I thought. And it should… https://t.co/NWwzMzmcHH
911 Followers 538 FollowingGTM @Tabnine prev @CockroachDB @Coursera documenting how ai is colliding with real-world workflows. field notes from the transition.
227 Followers 570 FollowingSecond year PhD @UW | Post-Training, LLM reasoning and synthetic dataset.
https://t.co/cYAkbnCsCp
Open to chat and collaborate!
3K Followers 1K FollowingResearch Engineering Lead at @StanfordCRFM. Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter and Marin @[email protected]
238 Followers 362 FollowingI don't teach you how to make money. 👀 I help you grow, optimise and unlock your full business potential with AI 🚀 | @AIGOConsult
108K Followers 3K Followingcreative AI workflows | job: Sr. Creative AI Evangelist & Community Advocate @Adobe | “Zarya of the Dawn”, "Rose Enigma" 1st AI copyrights | my views
323K Followers 1K FollowingOfficial Twitter of Boeing Defense, Space & Security. We provide technologies that connect and protect, from the battlespace to outer space to cyberspace.
619K Followers 383 FollowingWe specialize in defense tech, connecting customers with integrated solutions to ensure America and its allies can achieve peace through strength.
12K Followers 60 FollowingResearch @ frontier AI lab. Deep learning, scaling models, and training pipelines. Occasionally share experiments and findings.
1.1M Followers 685 FollowingExecutive Director, @FFO_Freedom. Former State Dept Cyber. Author of the unpublishable monstrosity, Weapons Of Mass Deletion.
88.0M Followers 158 FollowingOfficial NASA account. Exploring the universe, advancing science, and inspiring the next generation of explorers.
Verification: https://t.co/8nok3NP4PW
23K Followers 4K FollowingThe official Twitter account of The Good Trouble Show with Matt Ford. Views and opinions expressed here are solely of the show. #TheGoodTroubleShow
29K Followers 0 FollowingFeaturing 34 Government insiders, this groundbreaking documentary reveals an 80-year cover-up of non-human intelligent life. **Release date to be announced**
2K Followers 2K FollowingBible-believing disciple of Jesus Christ. Speak the Bible. Let the Lion out of the cage. The time for decision is very close. Speak truth to family and friends.
271K Followers 489 FollowingWorld-renowned PHD Biochemistry & Molecular Biology - Former Director Lab of Antiviral Drug Mechanism NCI - 300+ scientific published articles. NYT Best Seller
13K Followers 0 FollowingOur mission is to promote greater understanding of UAP (Unidentified Anomalous Phenomena) by advocating for government transparency & scientific research.
42K Followers 1K FollowingSecular Bayesian.
Professor of Machine Learning @Cambridge_CL. Talent aficionado at https://t.co/RbJkoLguey
Alum of @Twitter, Magic Pony and @Balderton
128K Followers 543 FollowingFormer CIA Officer, Espionage Investigator, HUMINT Collector, Author, "Twilight Of The Shadow Government. How Transparency Will Kill The Deep State.”
133 Followers 768 Following🤖 AI Research
📊 Transformers, Diffusion, Pretraining
🤝 Explaining & Doing AI Research
🎥 My YouTube below, Bilibili: vuk_ai
🧑🎓 我在学习中文,随时可以跟我用中文聊天