Tyler Michael Smith @tms_jr
High Performance Computing @neuralmagic | Committer @vllm_project | PhD @UTAustin | Music Enjoyer Boston, MA Joined September 2011-
Tweets1K
-
Followers137
-
Following294
-
Likes1K
Qwen3-VL is now ready for experimentation in Red Hat AI Inference Server with vLLM. 👉catalog.redhat.com/en/software/co… This builds on our recent step-by-step guide to Qwen3-Next and extends support to the new vision-language model, Qwen3-VL: redhat.com/en/blog/qwen3-… What’s next for our…
How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.
How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA. https://t.co/QzzPRvAaNa
Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥
We're ready :D github.com/vllm-project/v…
In 1991, David Lynch showed the world the alienation and innate horror of a dirty street, directing this unforgettable anti-littering ad for the City of New York. RIP to a visionary filmmaker and a pioneer of the Trash Revolution.
I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc. At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at…
a fact of the world that we have to live with: models when “jailbroken” seem to have a distinct personality and artistic capability well beyond anything they produce in their default mood this might be the most important alignment work in the world and is mostly done on discord
Read to learn about Machete, which will serve as a foundation for mixed-input quantized GEMMs on NVIDIA GPUs (Hopper and later!) inside of vLLM Excellent work and stellar animations by Lucas Wilkinson (github.com/LucasWilkinson)
Read to learn about Machete, which will serve as a foundation for mixed-input quantized GEMMs on NVIDIA GPUs (Hopper and later!) inside of vLLM Excellent work and stellar animations by Lucas Wilkinson (github.com/LucasWilkinson)
Quantization update! Transformers is now compatible with models quantized with llm-compressor library from @vllm_project or models in compressed-tensors format. This means that you can also enjoy high quality quantized models from the @NeuralMagic team!
this is an MJ Lenderman stan account youtube.com/watch?v=MwBZ_y…
A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…
Last week's vLLM office hours recording is ready! 🎥 @tms_jr showed how to use NVIDIA CUTLASS for high-performance inference in @vllm_project. We also explored the exciting vLLM v0.6.0 updates that led to a 2.7x throughput boost and 5x latency improvement. Recording & slides 👇
Happening soon -- Join if you want to see a roofline plot that I made in TikZ!
me: "looks like i need to calculate the variance of this distributed tensor -- what's that called again? oh! Welford's online algorithm" my brain for the next 3 days: "Wilford Brimley's online algorithm"
Join if you want to find out about how we're using CUTLASS to support quantization in vLLM -- specifically w8a8 for compute speedups, a deep dive into how we handle zero points for int8 asymmetric quantization, and how we put it all together to support FP8 Llama 3.1 405b.
Join if you want to find out about how we're using CUTLASS to support quantization in vLLM -- specifically w8a8 for compute speedups, a deep dive into how we handle zero points for int8 asymmetric quantization, and how we put it all together to support FP8 Llama 3.1 405b.

Nadav Timor @NadavTimor
1K Followers 8K Following AI inference, speculative decoding, open source. Built novel decoding algorithms – default in Hugging Face Transformers (150+ ⭐). Making AI faster + cheaper
Mayank Bhaskar @cataluna84
3K Followers 4K Following Machine Learning Consultant 🧑🏽💻 | @twimlai & @Cohere_Labs Community Lead 👥 | @AILucknow ⌨ | #engineer 🛠 🧮 | #datavisualization 📊 | #sports ⚽ 🏓 🏋🏽 🎮
Vaclav Tunka (EN) @vtunka
2K Followers 5K Following Engineering lead at @RedHat: @OpenShift cloud-native portfolio @KnativeProject, @kedaorg, K8s serving for AI/ML. Languages, sport, travel. My own opinions.
Iekirqvoud @Iekirqvoud9050
52 Followers 3K Following
Eric @B81zZ
23 Followers 80 Following
Yuan (Terry) Tang @TerryTangYuan
9K Followers 302 Following Senior Principal Engineer @RedHat_AI | Leader @argoproj @kubeflow @kubernetesio @CloudNativeFdn | Keynote Speaker | Author | Technical Advisor
Abigail Hernandez @Sleasur6ZpewXd
50 Followers 2K Following Go confidently in the direction of your dreams. Live the life you have imagined.
Jermey Robel @JermeyRobe29830
9 Followers 980 Following
shingles @theyaklord
7 Followers 67 Following
Sairckan @SairckanKkoFN3
43 Followers 4K Following
Tionead @TioneaduLB
54 Followers 4K Following
Smawsheau @SmawsheauGHHcD
39 Followers 4K Following
leloy! @leloykun
7K Followers 4K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC • https://t.co/nfO038itfn
Djmdn Rjjenndnd @rjjenndnd69519
0 Followers 67 Following
juliet dior @julesdior9017
70 Followers 743 Following just a simple lady trying to impact positive energy to people around me 💜😍🥹
Daniel Chen @mathscifi_exnv
17 Followers 185 Following AI Geek @UnslothAI. Ex-NVIDIA NASA-endorsed projects. Math enthusiast and sci-fi dreamer.
daniel foster @danielevang_ai
6 Followers 125 Following 🚀 Generative AI Evangelist | AI Strategist | Helping tech companies unlock the full potential of AI | Tech Enthusiast | #AIInnovation #AIinBusiness #GenerativPeng Tao @oatgnep
123 Followers 217 Following Linux filesystem, Kata Containers and Nydus developer. Always find ways rocking the WORLD!
Rothore @RothorexlN
40 Followers 4K Following
Ameen Patel @Ameen_ml
1K Followers 1K Following Inference @PrimeIntellect, prev @togethercompute, @AmazonScience, @uwaterloo
Shursmoan @ShursmoanXyF2q
67 Followers 7K Following
Kevin Gao @ml_research6
1 Followers 54 Following
Daniel Han @danielhanchen
28K Followers 2K Following Building @UnslothAI. Faster RL / training. LLMs bug hunter. OSS package https://t.co/aRyAAgKOR7. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.
Shorneal @Shornealoyn76x
55 Followers 4K Following
Tri Dao @tri_dao
33K Followers 632 Following Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.
yangel @yangel1119
4 Followers 415 Following
Xiuquan Lv @ustcwizard
47 Followers 2K Following
Ed Sealing @EdSealing
427 Followers 183 Following Founder of Sealing Technologies (acquired by Parsons 2023); Prior solider; Cyber, Systems Engineering, & Business. Currently exploring AI & investment opps.
Jackmin @jackminong
2K Followers 774 Following brutally slashing misbehaving computers @PrimeIntellect 🇺🇸. Previously @JinaAI_ 🇩🇪 @MoneyLion 🇲🇾.
Edd @erla_ndpg
205 Followers 2K Following Ex Computer Science student from Indonesia Only create X because AI Researchers are on X
Bowen😊(e/acc) @bowenisrising
325 Followers 4K Following CS System PhD student. Efficient LLM, Infra. Study Japanese. #INFP Get or Lose
Woosuk Kwon @woosuk_k
6K Followers 636 Following @thinkymachines | @vllm_project | PhD-ing @Berkeley_EECS
Roger Wang @rogerw0108
496 Followers 183 Following Flowers and friendship | ML Platform & Infra @Roblox | Committer @vllm_project | @uwaterloo @uwcse
Han Guo @HanGuo97
4K Followers 4K Following PhD Student @MIT_CSAIL | Past: @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
SangBin Cho @Saaaang94
3K Followers 489 Following reasoning @xAI | prev-founding engineer @anyscalecompute | senior committer of @raydistributed | committer @vllm_project Sglang | Github: rkooo567
Luka Govedič @luka_govedic
75 Followers 252 Following vLLM performance at @neuralmagic (now @redhat_ai) MIT BS '22 MEng '23 Former @bigwrenchult @shadeulti @bostongloryaudl @BerkeleyZyzzyva 🇸🇮🇸🇮
Alpin @AlpinDale
5K Followers 877 Following Every age, it seems, is tainted by the greed of men. Rubbish to one such as I, devoid of all worldly wants. — I work on HPC and making AI run faster.
Shota Natenadze @shotanat
237 Followers 2K Following I'm a Senior Data Scientist at @EpamSystems, passionate about machine learning, LLMs, and advancements in AI technology.
Byron Hsu @hsu_byron
4K Followers 2K Following ML system @xAI | @lmsysorg @liger_kernel @flyteorg @theASF
Himanshu Maurya @Himanshu_nitrr
448 Followers 6K Following Giving meaning to mine share of star dust. Visiting fellow @WinshipAtEmory. Prev at @oracle, @maddox_ai, @KITKarlsruhe, @_nference, @val_iisc, @iitdelhi.
Michael Goin @mgoin_
1K Followers 389 Following inference optimization @RedHat_AI building @vllm_project | you can call me misha
Eldar Kurtić @_EldarKurtic
736 Followers 628 Following Principal Research Scientist @RedHat_AI & Dan Alistarh's group @ISTAustria
Yuan (Terry) Tang @TerryTangYuan
9K Followers 302 Following Senior Principal Engineer @RedHat_AI | Leader @argoproj @kubeflow @kubernetesio @CloudNativeFdn | Keynote Speaker | Author | Technical Advisor
llm-d @_llm_d_
332 Followers 2 Following llm-d: a Kubernetes-native high-performance distributed LLM inference framework
leloy! @leloykun
7K Followers 4K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC • https://t.co/nfO038itfn
Thien Tran @gaunernst
1K Followers 229 Following
You Jiacheng @YouJiacheng
9K Followers 2K Following a big fan of TileLang 关注TileLang喵!关注TileLang谢谢喵! https://t.co/utshC0jrCO 十年老粉
Zihao Ye @ye_combinator
2K Followers 555 Following Proud to be an engineer. I'm building flashinfer (https://t.co/PabCM3ksjN) Opinions are my own.
Dan Fu @realDanFu
7K Followers 221 Following Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.
Dimitris Papailiopoul... @DimitrisPapail
20K Followers 1K Following Researcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.
Adam Hibble @Algomancer
4K Followers 989 Following I generate models that generate other stuff, working on @mancerlabs -- Prev: Founder of Popgun Labs (Techstars), Founder of the @QUTCode Network.
Fender @Fender
742K Followers 10K Following Since 1946, Fender has been the world’s foremost manufacturer of electric and acoustic guitars, bass guitars, amplifiers & accessories.
Together AI @togethercompute
51K Followers 390 Following AI pioneers train, fine-tune, and run frontier models on our GPU cloud platform.
Fireworks AI @FireworksAI_HQ
10K Followers 87 Following 🎆 Generative AI Platform built for developers
DeepSeek @deepseek_ai
972K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
lmarena.ai @arena
95K Followers 207 Following LMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / @lmsysorg. We’re hiring: https://t.co/1OkfLq2n0I
Chris Wright @kernelcdub
7K Followers 271 Following Red Hat CTO. Tezos Foundation council member. Passion for open source SW innovation. Father and husband. Cyclist. Human.
John R. Clout @JohnRClo
447 Followers 67 Following
megs @megs_io
17K Followers 309 Following ⧫⧫⧫⧫⧫ ɴᴏᴛ ʏᴏᴜʀ ᴀᴠᴇʀᴀɢᴇ ʀᴏʙᴏᴛ ᴘᴏʟɪꜱʜᴇʀ ⧫⧫⧫⧫⧫ ◊◊◊◊◊ ᴛʀʏɪɴɢ ᴛᴏ ʀᴇᴀᴄʜ ᴇꜱᴄᴀᴘᴇ ᴠᴇʟᴏᴄɪᴛʏ ◊◊◊◊◊ ⧫⧫⧫⧫⧫ ◊◊◊◊◊ ᴛᴇᴄʜɴᴏ ᴇxᴏʀᴄɪꜱᴛ ◊◊◊◊◊ ⧫⧫⧫⧫⧫
Quanquan Gu @QuanquanGu
16K Followers 2K Following Professor @UCLA, Pretraining and Scaling at ByteDance Seed | Recent work: Build AGI | Opinions are my own
Bowery Boston @boweryboston
11K Followers 2K Following Live music in and around Boston. @roadrunnerbos @thesinclair @thestageatsd
Cube Flipper @cube_flipper
4K Followers 1K Following As a human being, the kindest thing you can do to your brain is to not think.
Andy Ayrey @AndyAyrey
110K Followers 1K Following performance artist and hyperstitioneer: @upward_earth, infinite backrooms, @truth_terminal, ∞⟨X∴↯⟩∞
Kaichao You @KaichaoYou
4K Followers 134 Following phd student in tsinghua university, working on @vllm_project
Quanta Magazine @QuantaMagazine
349K Followers 619 Following Illuminating math and science. Supported by @SimonsFdn. 2022 Pulitzer Prize in Explanatory Reporting.
Mikhail Parakhin @MParakhin
23K Followers 21 Following
ken @aquariusacquah
10K Followers 7K Following @joinkaizen // writings at https://t.co/fcCc92lrbm and https://t.co/H2Z7HxOVBh // former head of eng @trucksmarterinc, eng @convoyteam
Lily Liu @eqhylxx
1K Followers 440 Following CS PhD Student, Sky Lab @UCBerkeley, @vllm_project, @OpenAI
Joshua Minsoo Kim @misterminsoo
23K Followers 5K Following 📨 @toneglow 💽 @toneglowrecords 📝 @singlesjukebox @thewiremagazine @pitchfork @bombmagazine @chicago_reader @nprmusic @billboard @mubinotebook @cinemascopemag
danielle chelosky @dniellechelosky
14K Followers 981 Following FEMALE LONELINESS EPIDEMIC ⋆。𖦹°⭒˚。⋆ header by elena redmond ♡
Tri Dao @tri_dao
33K Followers 632 Following Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.
ᄂIMIПΛᄂbardo @liminal_bardo
8K Followers 580 Following
Roger Wang @rogerw0108
496 Followers 183 Following Flowers and friendship | ML Platform & Infra @Roblox | Committer @vllm_project | @uwaterloo @uwcse
SangBin Cho @Saaaang94
3K Followers 489 Following reasoning @xAI | prev-founding engineer @anyscalecompute | senior committer of @raydistributed | committer @vllm_project Sglang | Github: rkooo567
Han Guo @HanGuo97
4K Followers 4K Following PhD Student @MIT_CSAIL | Past: @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
Luka Govedič @luka_govedic
75 Followers 252 Following vLLM performance at @neuralmagic (now @redhat_ai) MIT BS '22 MEng '23 Former @bigwrenchult @shadeulti @bostongloryaudl @BerkeleyZyzzyva 🇸🇮🇸🇮
Woosuk Kwon @woosuk_k
6K Followers 636 Following @thinkymachines | @vllm_project | PhD-ing @Berkeley_EECS
Byron Hsu @hsu_byron
4K Followers 2K Following ML system @xAI | @lmsysorg @liger_kernel @flyteorg @theASF
Michael Goin @mgoin_
1K Followers 389 Following inference optimization @RedHat_AI building @vllm_project | you can call me misha
Eldar Kurtić @_EldarKurtic
736 Followers 628 Following Principal Research Scientist @RedHat_AI & Dan Alistarh's group @ISTAustria