Michael Bernadsky @OctothorpeVoid
Menlo Park, CA Joined October 2016-
Tweets976
-
Followers74
-
Following501
-
Likes3K
1/2) Have you have noticed that the forward process in a diffusion model looks a lot like the reparameterization trick in VAEs? It turns out that there is a deep connection! Curious? Watch our new vedio in the Generative Memory Lab channels (link below)
4 year pytorch bug where all reduce operation produces INCORRECT gradients with no warning. Still not patched. Initially reported by @DrJimFan. Sharing this in case anyone is having mysterious gradient explosions. github.com/pytorch/pytorc…
CHAT its that time of the month. Whats THE MOST interesting work youve seen recently? Ill go first with this paper.
GEPA is crazy good - past typical human level prompting @DSPyOSS
One of the best paper of the recent week. The big takeaway: scaling up model size doesn’t just make models smarter in terms of knowledge, it makes them last longer on multi-step tasks, which is what really matters for agents. Shows that small models can usually do one step…
the emergence of attention sinks in LLMs is so fascinating, especially the fact that some of them are useful.
.@gneubig and I are co-teaching a new class on LM inference this fall! We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work. website: phontron.com/class/lminfere…
One of the cleanest DeepResearch papers I have read in a while. They RL gpt-oss-20b into a SOTA model on Humanity's Last Exam with search, visit, code tools. LLM as a judge for rewards and REINFORCE variant was used with medium-difficulty data which required search.
Very cool thread about whether LLMs can multi hop reason without CoT or not. If you're curious, read the full thread, it's well written and clearly answers.
Very cool thread about whether LLMs can multi hop reason without CoT or not. If you're curious, read the full thread, it's well written and clearly answers.
WSM replaces learning rate decay with constant LR training and checkpoint merging, achieving the same or better results without modifying the training process. It shows that decay is just a form of weighted gradient averaging and uses this insight to merge checkpoints in a way…
The Big Fat LLM Architecture Comparison > Qwen3 Dense > Qwen3 MoE > Llama 4 > DeepSeek-V3/R1 > GPT-OSS > OLMo 2 > Gemma 3 > Mistral Small 3.1 > SmolLM3 > Kimi 2 MoE, MLA, GQA, sliding window, normalization etc. etc.
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…
Qwen3-Next is hybrid GatedAttention (for outliers fix) GatedDelta net rnn for kv saving all new models will be either sink+swa hyprids like gpt oss or gated attn + linear rnn hybrids (mamba , gated deltanet etc) like qwen3-next age of pure attn for timemixing layer is over,
It took me 2 days to finally grasp how fixed sparse attention works. And when it did, I realised this is one of my most genius ideas I have ever had the pleasure to encounter.
KV cache compression techniques ▪️KV caching (basic) – stores previously computed Keys and Values in memory and calculates attention only for new tokens. ▪️ Quantization – represents KV cache with fewer bits. ▪️ Low-rank decomposition – compresses the KV cache into smaller…
I have been fine-tuning LLMs for over two years now! Here are the top 5 LLM fine-tuning techniques, explained visually:
Another impressive paper by Google DeepMind. It takes a closer look at the limits of embedding-based retrieval. If you work with vector embeddings, bookmark this one. Let's break down the technical details:
“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material. In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵
You can now run 100B parameter models on your local CPU without GPUs. Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp: > 6.17x faster inference > 82.2% less energy on CPUs > Supports Llama3, Falcon3, and BitNet models
The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty. The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert. 1) New architecture > Layers have 2…

Yswauikui @Yswauikui2309
108 Followers 3K Following
KamaWesley @8Y7s1w4cE48BS03
16 Followers 475 Following
DaisyWill @JCFcBwa6ou6iE
15 Followers 471 Following
nana🦄 @ds_nana_
33K Followers 10K Following data scientist from a non tech background. code with coffee. share my learnings here #datascience #python #rstats #sql
Venie @Venie1685
115 Followers 3K Following
Sririecac @Sririecac16889
33 Followers 2K Following
AI Linux (Sbnb Linux) @sbnb_io
317 Followers 3K Following Linux distro for AI computers ("GPUters"). Go from bare-metal to running AI workloads - in minutes, fully automated.
Xawap @Xawap462873
97 Followers 2K Following
AntoniaMalory @Kd3V9uxuUW0U8
29 Followers 1K Following
. @Bourb0nCap
329 Followers 2K Following Growth Investments - Long Term | Insider News: @BourbonInsider Find below the link to my Patreon and all my sources
Foohal @Foohal7651
19 Followers 1K Following
Lily Rowe @LilyRowe320431
99 Followers 4K Following
Rofl fights @roflfights
76K Followers 6K Following fights that got me cryin 😭 Not financial advice & Do your own research credit ≠ endorsement
Ceejus @Ceejus8005982
100 Followers 2K Following
Ohalnoo @Ohalnoo377475
23 Followers 1K Following
Etworler @Etworler171
6 Followers 897 Following
Ruthie Gutkowski @GutkowskiR46413
41 Followers 3K Following
Karlie Marks @MarksKarli94323
59 Followers 3K Following
小林 拓哉 @dedoytou94248
34 Followers 4K Following I can make it through the rain. I can stand up once again on my own.
Nayan Saxena @SaxenaNayan
3K Followers 2K Following Teaching machines how to think, express, and create – one model at a time. https://t.co/Myd0l6JD0Q
NatalieJohnny @9mbtx88QtjzBX
54 Followers 6K Following
Codi @fujisakika40503
67 Followers 7K Following
Carlene @o_carlene82
175 Followers 3K Following
David Ceesay @DavidCeesa85642
29 Followers 413 Following
Cennarsh @CennarshnUiogG
47 Followers 5K Following
Tharteth @ThartethVAYk
67 Followers 4K Following
TammyAlbert @vs3dqy5R84Lf87
73 Followers 7K Following
Chnesos @ChnesosNHJ2Hl
47 Followers 4K Following
NatalieJohnson @0G8lWPx8dfO8dI
75 Followers 7K Following
Doutaez @doutaez2474
131 Followers 7K Following I'm new to Twitter accounts so I tried the messaging feature and it's great to meet you.
Slartee @SlarteekgBfqQ
41 Followers 4K Following
MavisTobias @1uw8UQ3a7j0wv8y
69 Followers 7K Following
Sethough @sethough16280
103 Followers 5K Following
Thiroythat @ThiroythatYoy
56 Followers 5K Following
AdaFowler @b949Da68z5OQKa4
68 Followers 7K Following
Toytreigh @ToytreighIU8
67 Followers 7K Following
CharlotteBray @0QrlEk8zWVbdJ
82 Followers 7K Following
Emma Dumont @oOL5eGGQs6k1W7f
20 Followers 2K Following
Thawneaun @ThawneaunVNx
17 Followers 422 Following
TinaTyler @4NfRFS3s02qxQ
69 Followers 7K Following
NydiaBessemer @DN038N6IzwjXv
64 Followers 7K Following
SaraBelle @H0Zc040U557Dq
63 Followers 7K Following
JenniferStowe @58jYyl7rh2Q95
13 Followers 1K Following
Tysme @Tysme0WkV
21 Followers 2K Following
Thortes @Thortes126007
62 Followers 7K Following
Kimi.ai @Kimi_Moonshot
53K Followers 100 Following Built by Moonshot AI to empower everyone to be superhuman.
Bourbon Capital @BourbonCap
54K Followers 99 Following Growth Investments - Long Term | Insider News: @BourbonInsider Find below the link to my Patreon and all my sources
Today Years Old @todayyearsoldig
1.1M Followers 107 Following Your source for the latest trends, discoveries, and most shocking truths & little-known facts about the world. 🚀 DM us your findings!
Ostris @ostrisai
9K Followers 285 Following AI / ML researcher and developer. Creator of AI Toolkit - https://t.co/Thqof0Gxpj Support my work - https://t.co/Isg2EXrP7s
The Money Cruncher, C... @money_cruncher
143K Followers 172 Following A licensed CPA talking about personal finance. I write https://t.co/vyvJ476LiL for 18,000 readers Not a financial/tax advice
bad girl fights @badgirlfights
113K Followers 6K Following where the prettiest girls throw the ugliest punches 💅👊 turn noti’s on 💘 Not financial advice & Do your own research credit ≠ endorsement
𝗿𝗮𝗺𝗮𝗸�... @techwith_ram
7K Followers 726 Following Sr. DS. AI news. Memes. Φ² = Φ + 1 → My models strive for elegance & efficiency. 🥦 https://t.co/k0P7ZvFN2M 🍐 https://t.co/dTfpCYgtck {soon}
Songlin Yang @SonglinYang4
14K Followers 3K Following research @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
LTX Studio @LTXStudio
22K Followers 28 Following The storytelling platform to give form to your imagination.
Ivan Skorokhodov @isskoro
3K Followers 490 Following Research Scientist @Snap. I like neural networks and neural networks like me.
HPC Papers @HPCPapers
304 Followers 115 Following Latest submissions in Distributed, Parallel, and Cluster Computing to @arxiv.
marimo @marimo_io
5K Followers 8 Following An open-source reactive Python notebook: reproducible, git-friendly, execute as scripts, share as apps 🌐 https://t.co/YMoQiRxnLW 💬 https://t.co/F2LZUXvGMb
Tom Dörr @tom_doerr
102K Followers 2K Following Follow for posts about GitHub repos, DSPy, and agents Subscribe for top posts DM to share your AI project (Due to volume of DMs I'll prioritize subscribers)
Prime Intellect @PrimeIntellect
48K Followers 28 Following find compute. train models. contribute to open superintelligence. https://t.co/ZRZOsRRbwr
Peter Tierney @drpetertierney
6K Followers 464 Following Performance | Health | Sports Science | Research | Innovation | Prev @chelseafc @lululemon @england @leinsterrugby | PhD ProfDip MSc BSc @ucddublin | My views
Coach Wayland | Perfo... @WSWayland
15K Followers 950 Following Architect of strength, curator of performance. Craftsman. Consultant to @ETPI 🇪🇺 @hplanner_tour, England 🏴 Golf & UFC Fighters, High Performers.
DeepSeek @deepseek_ai
971K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Tengyu Ma @tengyuma
37K Followers 564 Following Assistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ; Working on ML, DL, RL, LLMs, and their theory.
kalomaze @kalomaze
19K Followers 2K Following ML researcher (@primeintellect), speculator • extremely silly jester
LightCounting @lightcounting
2K Followers 509 Following LightCounting is an optical communications market research company, offering market intelligence, analysis & forecasts. Retweets ≠ endorsements.
Lightricks @Lightricks
4K Followers 474 Following Bridging the gap between imagination and creation✨ Powering @ltx_video @facetune @photoleapapp @videoleapapp @popularpays @ltxstudio
Quiver Quantitative @QuiverQuant
355K Followers 782 Following Bridging the information gap between Main Street and Wall Street. Disclaimer: https://t.co/dIbqx0Q4fW
Bogáta Timár @BogataTimar
5K Followers 191 Following Proto-Uralic unicorn doing PhD in Tartu, from Hungary. Linguist, makeshift musician, teacher. My heart is in the Volga region. Also on @bogatatimar.bsky.social
Insider Radar @InsiderRadar
40K Followers 1 Following Alerts on significant insider trading activity, from @QuiverQuant
Arthur Douillard @Ar_Douillard
8K Followers 2K Following Distributed Learning @ deepmind | DiLoCo, DiPaCo. Continual Learning PhD @ Sorbonne
Vitaliy Chiley @vitaliychiley
3K Followers 1K Following LLM Reasearch @ Meta. ex @DataBricks (@DBRXMosaicAI), @CerebrasSystems
Noam Brown @polynoamial
92K Followers 856 Following Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o3 / o1 / 🍓 reasoning models
Sander Dieleman @sedielem
64K Followers 2K Following Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
Alyssa @alyssakrejmas
8K Followers 6K Following Founder @withnouri blending thoughtful human care & tech to support parents • @theinnocrew supporting founders through events/authentic convos
Mariya I. Vasileva @mariyaivasileva
19K Followers 2K Following Research @Meta Superintelligence Labs •🦙 multimodal safety • ex @AWS • 🎓 @IllinoisCDS (PhD), @Caltech • @WiMLWorkshop, @CVFADworkshop, @ResistanceAI • 🇧🇬
pytorch to atoms @PytorchToAtoms
772 Followers 227 Following
Claudio Jolowicz @cjolowicz
962 Followers 448 Following Author, software engineer, musician, husband, dad. Hypermodern Python. he/him @[email protected] Pre-order https://t.co/6Bm9xJnooa or your local bookstore
Nancy Pelosi Stock Tr... @PelosiTracker_
1.2M Followers 524 Following Highlighting Politicians' trades so we can invest alongside Goal: get them banned from trading. $800,000,000 invested on @joinautopilot_ so far
Linoy Tsaban🎗️ @linoy_tsaban
5K Followers 1K Following Exploring the world of AI Art as a ML engineer @HuggingFace 🤗 | #BringThemHome 🎗️
Frank (Haofan) Wang @Haofan_Wang
2K Followers 518 Following Building @lovart_ai. Open-source @instantx_ai. Alumni @CarnegieMellon.
Miguel Angel Bautista @itsbautistam
3K Followers 195 Following I am a research scientist at MLR, working on generative modeling of all the things (image, 3D, graphs, etc). I like to make complex approaches Simple 🇪🇸🇺🇸
skunkworks @skunkworks_ai
5K Followers 3 Following OSS ai research community building basement agi https://t.co/9TNVZeJYjd
OpenRouter @OpenRouterAI
55K Followers 308 Following Discover and use the latest LLMs. 500+ models (incl. 50+ free), explorable data, private chat, & a unified API. https://t.co/qJG5mKrigL
Denis Moskvin, ανε... @deniok
1K Followers 256 Following Аny resemblance to the views of HSE University management in this Twitter account is purely coincidental.
Azalia Mirhoseini @Azaliamirh
15K Followers 528 Following Asst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.
Finster AI @finsterai
356 Followers 21 Following Building AI agents that integrate disparate sources of financial and market data. Founded by an ex DeepMind and JP Morgan team. Backed by @HoxtonVentures.