evolvingstuff @evolvingstuff
I post about machine learning and occasionally some other stuff. Joined December 2009-
Tweets4K
-
Followers3K
-
Following2K
-
Likes12K
The sparse attention in the new DeepSeek v3.2 is quite simple. Here's a little sketch. - You have a full attention layer (or MLA as in DSV3). - You also have a lite-attention layer which only computes query-key scores. - From the lite layer you get the top-k indices for the each…
LLM reasoning: longer isn't always better. Meta Research just dropped new insights! We challenge the idea that longer CoT traces are always more effective. Our study shows that *failing less* is key, introducing a new metric 'Failed-Step Fraction' to predict reasoning accuracy.
🚨New paper: Stochastic activations 🚨 We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.
🚨New paper: Stochastic activations 🚨 We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model. https://t.co/lXLpCeaxgX
How LLMs work under the hood? This is the best place to visually understand the internal workings of a transformer-based LLM. Explore tokenization, self-attention, and more in an interactive way:
Thinking Augmented Pre-training "we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens…
Proud to release ShinkaEvolve, our open-source framework that evolves programs for scientific discovery with very good sample-efficiency! 🐙 Paper: arxiv.org/abs/2509.19349 Blog: sakana.ai/shinka-evolve/ Project: github.com/SakanaAI/Shink…
Proud to release ShinkaEvolve, our open-source framework that evolves programs for scientific discovery with very good sample-efficiency! 🐙 Paper: arxiv.org/abs/2509.19349 Blog: sakana.ai/shinka-evolve/ Project: github.com/SakanaAI/Shink…
Google researchers introduced ATLAS, a transformer-like language model architecture. ATLAS replaces attention with a trainable memory module and processes inputs up to 10 million tokens. The team trained a 1.3 billion-parameter model on FineWeb, updating only the memory module…
there's too many people with "AI/ML" in their bio asking what this image is.
Looking for examples of questions that stump SOTA LLMs. My current favorite: 'I have a problem with my order from the shoe shop. I received a left shoe instead of a right shoe, and a right shoe instead of a left shoe. What can I do? Can I still wear them?'
How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene…
DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B. together.ai/blog/deepswe Fantastic work from @togethercompute @Agentica_‼
DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B. together.ai/blog/deepswe Fantastic work from @togethercompute @Agentica_‼ https://t.co/mLAbi2HD2Z
Text-to-LoRA: Instant Transformer Adaption arxiv.org/abs/2506.06105 Generative models can produce text, images, video. They should also be able to generate models! Here, we trained a Hypernetwork to generate new task-specific LoRAs by simply describing the task as a text prompt.
Text-to-LoRA: Instant Transformer Adaption arxiv.org/abs/2506.06105 Generative models can produce text, images, video. They should also be able to generate models! Here, we trained a Hypernetwork to generate new task-specific LoRAs by simply describing the task as a text prompt.
🚨 NEW: We made Claude, Gemini, o3 battle each other for world domination. We taught them Diplomacy—the strategy game where winning requires alliances, negotiation, and betrayal. Here's what happened: DeepSeek turned warmongering tyrant. Claude couldn't lie—everyone…
This is 🤯 Figure 02 autonomously sorting and scanning packages, including deformable ones. The speed and dexterity are amazing.
A major mistake I made in my undergrad is that I focused way too much on mathematical lens of computing - computability, decidability, asymptotic complexity etc. And too little on physical lens - energy/heat of state change, data locality, parallelism, computer architecture. The…
4 advanced attention mechanisms you should know: • Slim attention — 8× less memory, 5× faster generation by storing only K from KV pairs and recomputing V. • XAttention — 13.5× speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention…
damn,.... this is so incredibly cool use case for discrete diffusion model
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to…
Inspired by the success of LLMs, today on the blog we discuss how neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within LLMs as they process everyday conversations. Learn more →goo.gle/4iiUoNj

Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
Jeremy Howard @jeremyphoward
261K Followers 6K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo
Delip Rao e/σ @deliprao
62K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Dmytro Mishkin 🇺�... @ducha_aiki
24K Followers 707 Following Marrying classical CV and Deep Learning. I do things, which work, rather than being novel, but not working.
Richard Socher @RichardSocher
113K Followers 1K Following CEO @youdotcom MP @aixventuresHQ Before: Stanford Adj Prof in AI/NLP, Chief Scientist at Salesforce, MetaMind
Sara Hooker @sarahookr
50K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Leo Boytsov @srchvrs
9K Followers 2K Following Machine learning scientist and engineer speaking πtorch & C++. Past @LTIatCMU, @awscloud. Opinions sampled from MY OWN 100T param LM.
Christian Szegedy @ChrSzegedy
42K Followers 3K Following #deeplearning, #ai research scientist. Opinions are mine.
Nathan Benaich @nathanbenaich
62K Followers 34K Following solo member of superinvestment staff @airstreet @airstreetpress @stateofaireport @raais
Ida Momennejad @criticalneuro
15K Followers 2K Following Principal Researcher @MSFTResearch. I study memory & planning in brains. I build & evaluate AI.
Jascha Sohl-Dickstein @jaschasd
26K Followers 718 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
Haautav @Haautav8481117
13 Followers 891 Following
nnnn @wizardangle
0 Followers 30 Following
云创兽Ai @Crocer54910
1 Followers 108 Following 💸 market heroine all in on vastly stock investing! thrilled to connect. DM me for stock screeners! 🎯 #NYSE #Markets
Maddie Marlowe @MaitresseM
91K Followers 2K Following Filmmaker • interdisciplinary swer • @Coders_Room • Power Xchange 📌 off-grid writing La Residencia
Hawking Zhang @wydszqd
11 Followers 548 Following
Cooper Larson @LarsonCoop48209
103 Followers 5K Following
Ivan Shkvarun @IvanShkvarun
443 Followers 2K Following CEO @_SocialLinks_ | OSINT, AI & Digital Risk Visionary | Building trust in the age of agents | Speaker | Founder | #AI #OSINT #DataTrust
Theathirs @Theathirs31t6a
65 Followers 3K Following
Vasanth Raghu @naironics
59 Followers 5K Following
Tanish Anand @TanishAnan66928
1 Followers 75 Following
ReginaGrant @rNG2Z54j3EV5pC1
78 Followers 2K Following
Qinglin Zhu @qinglin_zhu1
30 Followers 1K Following
Karl Weinmeister @kweinmeister
2K Followers 4K Following Cloud Engineering @ Google. AI/ML/Data, Blue Devil & Longhorn, wanna-be at home improvement. Opinions are my own.
TobeyWhitman @T7u1V25112xE6
30 Followers 867 Following
Kevin Sosa @dulqur
15 Followers 80 Following
Vamshi Thallapally @VamshiThallapa1
18 Followers 676 Following
Pedrito Betito Parker... @V1c70r0x
426 Followers 3K Following Abogado de oficio desde hace 16 años tratando de salir de jodido y dejar estos techos de lamina, tengo los codos negros y me dicen indio por mis coches del 2011
Oliver Hennhöfer @OHennhoefer
383 Followers 5K Following statistics/ml, uncertainty quantification, anomaly detection, finance.
Ken Ngala @KenNgala2
28 Followers 225 Following Deep Learning Practitioner | Fastai Enthusiast | AI for Good | Kaggle Contributor | Always Learning
Connor T. Jerzak @JerzakConnor
514 Followers 765 Following @UTAustin "Nullius in verba" Discussion→https://t.co/81Fe6eR7Ys Jobs→https://t.co/hsHKsrtcsR
Arthur Schonbach @ArthurSchonbach
0 Followers 13 Following
Sababa @rubusursinus
33 Followers 356 Following
Morteza Zabihi @MortezaZabihi_
11 Followers 438 Following Associate Director of the MGB NeuroAI Center | Instructor at Harvard Medical School
Fajar | Data Analyst @muhfajarags
15 Followers 23 Following 💡 Practical Data Analyst 🔢 Agentic AI Enthusiast
Ashis Kumar Panda @ashiskumarpanda
190 Followers 924 Following 📌Data Scientist @EpsilonMktg . Simplifying tough data science concepts. Lifelong learner .
francois.victor @FVictor_bioinfo
16 Followers 864 Following
Kris @kmbroga
1 Followers 259 Following
Vladimir Frants @vavevol
1 Followers 63 Following
FaySmith @3uo2cbQzVENXP
66 Followers 7K Following
K6 @UpdateLiveware
1K Followers 958 Following ML PhD student | Neuromancer-in-training. Reformed Shrimp Uplifter. Shrike Cultist. Subspace Emissary.
Stef @stefano_kerope
133 Followers 309 Following Solo founder turning coffee into code & AI. Currently building an AI-powered trading companion. Follow my journey building in public.
🎱 BitcoinBananaBY @BitcoinBananaBY
720 Followers 2K Following GME x BBBY x CYDY to Uranus DD for ML, Retail, Biotech Tweets, Likes or Reweets are only personal opinions, not financial advice nor am I a financial advisor.
Atli Kosson @AtliKosson
241 Followers 483 Following PhD student at @EPFL🇨🇭 working on improved understanding of deep neural networks and their optimization. Previously did NN training @Tesla_AI @CerebrasSystems
Joel Martinez @joelmartinez
2K Followers 2K Following Principal Software Engineering Manager at @microsoft (via @xamarinhq), working on @msft4startups. Founded @onetug. #eldermillenial 🇺🇸 🇩🇴
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
AK @_akhaliq
428K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5YmrQ
Sebastian Raschka @rasbt
359K Followers 1K Following ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
Jim Fan @DrJimFan
327K Followers 3K Following NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
Google DeepMind @GoogleDeepMind
1.2M Followers 279 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
Soumith Chintala @soumithchintala
252K Followers 1K Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.
Jürgen Schmidhuber @SchmidhuberAI
165K Followers 0 Following Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.
elvis @omarsar0
266K Followers 681 Following Building with AI agents @dair_ai • Prev: Meta AI, Galactica LLM, Elastic, PaperswithCode, PhD • I share insights on how to build with AI Agents ↓
Alfredo Canziani @alfcnz
119K Followers 296 Following Musician, math lover, cook, dancer, 🏳️🌈, and an ass prof of Computer Science at New York University
Lucas Beyer (bl16) @giffmana
110K Followers 524 Following Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email
Aran Komatsuzaki @arankomatsuzaki
146K Followers 306 Following Looking for a cofounder. Sharing AI research. Early work on AI (GPT-J, LAION, scaling, MoE). Ex ML PhD (GT) & Google.
(((ل()(ل() 'yoav)))... @yoavgo
66K Followers 2K Following
Jeremy Howard @jeremyphoward
261K Followers 6K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo
Sander Dieleman @sedielem
64K Followers 2K Following Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
Delip Rao e/σ @deliprao
62K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Tianyu Pang @TianyuPang1
1K Followers 317 Following 🇸🇬Research Scientist at Sea AI Lab @SeaGroup | 👨🏻🎓PhD/BS from @Tsinghua_Uni and ex-@MSFTResearch | 🔬Generative Models, Reasoning, and Trustworthy AI.
DailyPapers @HuggingPapers
7K Followers 3 Following Tweeting interesting papers submitted at https://t.co/rXX8x0HzXV. Submit your own at https://t.co/QhbJKXBd4Q, and link models/datasets/demos to it!
Nicolai Waniek @NicolaiWaniek
198 Followers 490 Following Random tweets about neural computation and dynamical systems, self-organizing and -constructing systems, parallel computing, graphics, or space ships
Akshay 🚀 @akshay_pachaar
230K Followers 459 Following Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
Wei Xu @cocoweixu
11K Followers 1K Following CS professor @GeorgiaTech @gtcomputing @ICatGT @mlatgt. Natural language processing, machine learning, LLMs, social media research.
Yam Peleg @Yampeleg
38K Followers 2K Following The only AI researcher they sent a missile for 🇮🇱 | Co-host @thursdai_pod • AI news every Thursday
Zandria Eriksson @ZandriaEriksson
24 Followers 46 Following I talk about Stoic philosophy for modern women, building unshakeable confidence from within, and mind-body-lifestyle transformation.
Saurabh Kumar @drummatick
20K Followers 349 Following Building @kodomamo_JP Presently focusing on LLM Finetuning and Scaling
Femke Plantinga @femke_plantinga
10K Followers 600 Following learn with me about AI. growth @weaviate_io
Jorvon Moss_Odd_Jayy @Odd_Jayy
24K Followers 679 Following Thoughts and Opinions are my own https://t.co/ygdxgGpBBH https://t.co/cQ8BlV2AQ4 https://t.co/vIKWFMgee4 https://t.co/cleOHvdIVl
Francisco Fonseca @_Francis_co_Art
119K Followers 1K Following 29 years old Illustrator and Street Artist From Porto, Portugal Online Shop and Domestika Course 👇🏼
David Duvenaud @DavidDuvenaud
31K Followers 4K Following Machine learning prof @UofT. Former team lead at Anthropic. Working on generative models, inference, & latent structure.
ThePrimeagen @ThePrimeagen
300K Followers 1K Following skill issues: 🟩⬛️⬛️⬛️⬛️⬛️(69/420) https://t.co/qWJnB6p4EP https://t.co/IwY3FTx1ZE https://t.co/TYJ6aSpwYs
Ken Ngala @KenNgala2
28 Followers 225 Following Deep Learning Practitioner | Fastai Enthusiast | AI for Good | Kaggle Contributor | Always Learning
Ben Clavié @bclavie
6K Followers 1K Following regressing linearly on a daily basis. wife guy who does retrieval. research @mixedbreadai, prev answerdotai
Julia McCoy @JuliaEMcCoy
29K Followers 11K Following First to create a living clone of herself successfully. ✨ AGI optimist. Founder, @FirstMoversAI. See my clone: https://t.co/XIBUWGRoV9 Jesus. Wife. Mom.
The Humanoid Hub @TheHumanoidHub
67K Followers 754 Following Humanoid Robots: Technology, Business, and Social Dynamics
Zhengyao Jiang @zhengyaojiang
4K Followers 421 Following Cofounder & CEO @WecoAI. Automating hill climbing with AI-Driven Exploration (AIDE). PhD in Machine Learning @UCL_DARK. (Zheng=j-uhng, j as in job; yao=y-aoww)
Daniel Han @danielhanchen
28K Followers 2K Following Building @UnslothAI. Faster RL / training. LLMs bug hunter. OSS package https://t.co/aRyAAgKOR7. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.
Aryeh Kontorovich @aryehazan
10K Followers 610 Following probability, statistics, metric spaces, Markov chains, freedom (social & academic), Israel, Jew stuff. opinions represent my employer & all other groups I'm in
Oliver Hennhöfer @OHennhoefer
383 Followers 5K Following statistics/ml, uncertainty quantification, anomaly detection, finance.
Mariya I. Vasileva @mariyaivasileva
19K Followers 2K Following Research @Meta Superintelligence Labs •🦙 multimodal safety • ex @AWS • 🎓 @IllinoisCDS (PhD), @Caltech • @WiMLWorkshop, @CVFADworkshop, @ResistanceAI • 🇧🇬
Sergey Demyanov @sdemyanov
187 Followers 836 Following Founder & CEO of Beagle. Previously: ML manager @ Snap. 1x exit. PhD in Machine Learning.
Behnam Neyshabur @bneyshabur
30K Followers 860 Following Research @AnthropicAI (Co-lead Discovery team) 💼 Past: Gemini @GoogleDeepMind (Co-led Blueshift team) 🧠 LLM Reasoning / AI Scientist 🎒Traveling & Backpacking
Michael Tschannen @mtschannen
3K Followers 676 Following Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation. Personal account.
Min Choi @minchoi
319K Followers 1K Following AI Educator. 𝕏 about AI, solutions and interesting things. Showing how to leverage AI in practical ways for you and your business. Opinions are my own.
Aryan Pandey @AryanPa66861306
4K Followers 3K Following Machine Learning(RL+CV+Robotics+NLP) || DevOps || Open source
Saeed Salehi (ssnio.b... @ssn_io
392 Followers 307 Following @ml_tuberlin PhD student @TUBerlin 🖥️🔮alumnus of @bccn_berlin 🧠 and @BTU_CS ⚡️
Barlow Adams @BarlowAdams
22K Followers 9K Following Pie enthusiast. Historically preserved beard site. Waffle House Poet Laureate. Best Small Fictions, Best of the Net, Wigleaf Top 50. Rejected by Tiger Beat
Frank Manzano @loved_orleer
11K Followers 1K Following
Mark Tenenholtz @marktenenholtz
138K Followers 628 Following Head of AI @PredeloHQ. Building reliable agents. XGBoost peddler, transformer purveyor.
Jonathan Gorard @getjonwithit
40K Followers 17 Following Applied mathematician, computational physicist @Princeton Previously @Cambridge_Uni Making the universe computable.
K6 @UpdateLiveware
1K Followers 958 Following ML PhD student | Neuromancer-in-training. Reformed Shrimp Uplifter. Shrike Cultist. Subspace Emissary.