Vijay @__tensorcore__
MLIR, CUTLASS,Tensor Core arch @NVIDIA. Mechanic @hpcgarage. Exercise of any 1st amendment rights are for none other than myself. thakkarv.dev Joined July 2015-
Tweets1K
-
Followers2K
-
Following525
-
Likes8K
Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf…
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…
“TogetherAI’s chief scientist @tri_dao announced Flash Attention v4 … uses CUTLASS CuTe Python DSL” As always, thanks for being the tip of the spear and pushing us along too 💚
“TogetherAI’s chief scientist @tri_dao announced Flash Attention v4 … uses CUTLASS CuTe Python DSL” As always, thanks for being the tip of the spear and pushing us along too 💚
Using CUTLASS CuTe-DSL, TogetherAI's Chief Scientist @tri_dao announced that he has written kernels that is 50% faster than NVIDIA's latest cuBLAS 13.0 library for small K reduction dim shapes on Blackwell during today's hotchip conference. His kernels beats cuBLAS by using 2…
Cute-DSL is basically perfect (for me). thank you nvidia and cutlass team. i no longer need to wait for long compile times because i underspecified a template param. i hope everyone involved gets an extra chicken nugget in their happy meal
On Sep 6 in NYC, this won't be your typical hackathon where you do your own thing in a corner and then present at the of the day. You'll deploy real models to the market, trades will happen, chaos should be expected. The fastest model is great but time to market matters more.
ariXv gpu kernel researcher be like: • liquid nitrogen cooling their benchmark GPU • overclock their H200 to 1000W "Custom Thermal Solution CTS" • nvidia-smi boost-slider --vboost 1 • nvidia-smi -i 0 --lock-gpu-clocks=1830,1830 • use specially binned GPUs where the number…
Part 2: developer.nvidia.com/blog/cutlass-3… Covers the design of CUTLASS 3.x itself and how it builds a 2 layer GPU microkernel abstraction using CuTe as the foundation.
CUTLASS 4.1 is now available, which adds support for ARM systems (GB200) and block scaled MMAs
Hierarchical layout is super elegant. Feels like the right abstraction for high performance GPU kernels. FlashAttention 2 actually started bc we wanted to rewrite FA1 in CuTe
Hierarchical layout is super elegant. Feels like the right abstraction for high performance GPU kernels. FlashAttention 2 actually started bc we wanted to rewrite FA1 in CuTe
CuTe is such an elegant library that we stopped working on our own system and wholeheartedly adopted CUTLASS for vLLM in the beginning of 2024. I can happily report that was a very wise investment! Vijay and co should be so proud of the many strong OSS projects built on top 🥳
CuTe is such an elegant library that we stopped working on our own system and wholeheartedly adopted CUTLASS for vLLM in the beginning of 2024. I can happily report that was a very wise investment! Vijay and co should be so proud of the many strong OSS projects built on top 🥳
This is what the internet was made for 🥹
This is what the internet was made for 🥹
Cosmos-Predict2 meets NATTEN. We just released variants of Cosmos-Predict2 where we replace most self attentions with neighborhood attention, bringing up to 2.6X end-to-end speedup, with minimal effect on quality! github.com/nvidia-cosmos/… (1/5)
Getting mem-bound kernels to speed-of-light isn't a dark art, it's just about getting the a couple of details right. We wrote a tutorial on how to do this, with code you can directly use. Thanks to the new CuTe-DSL, we can hit speed-of-light without a single line of CUDA C++.
Getting mem-bound kernels to speed-of-light isn't a dark art, it's just about getting the a couple of details right. We wrote a tutorial on how to do this, with code you can directly use. Thanks to the new CuTe-DSL, we can hit speed-of-light without a single line of CUDA C++.
🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao
Another 🔥 blog about CUTLASS from @colfaxintl, this time focusing on the gory details of block-scaled MXFP and NVFP data types and Blackwell kernels for them. research.colfax-intl.com/cutlass-tutori…
We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GTA & GLA are steps in this direction: attention variants tailored for inference: high arithmetic intensity (make GPUs go brr even during decoding), easy to…
We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GTA & GLA are steps in this direction: attention variants tailored for inference: high arithmetic intensity (make GPUs go brr even during decoding), easy to…

Jon Masters 🏴�... @jonmasters
15K Followers 7K Following Troublemaker | Computer Architect | @Arm Servers Architect @Google | Previously @RedHat, @Nuvia_Inc | Runner | Author | All views my own | #ArmServers
Longhorn @never_released
14K Followers 143 Following Kernel/hypervisor engineer @awscloud EC2. Hobby @checkra1n. Mastodon: https://t.co/DsXP8PFgL0 Bluesky: https://t.co/dAOfFSSqY4
Dylan Patel @dylan522p
96K Followers 943 Following SemiAnalysis Boutique AI & Semiconductor Research and Consulting DMs are open for consulting, quotes, or to talk shop
Stacy Rasgon @Srasgon
12K Followers 4K Following Semiconductors, stocks, scifi, and smallfry, from a serendipitous sell-sider settled in sunny SoCal. Apparently 65 in a 45 zone. Also banana tweets.
Josiah Draper (aka Al... @coolingreviews
2K Followers 2K Following PC Cooling Reviewer for @tomshardware
Dayman @Dayman58
2K Followers 613 Following
Moshe Dolejsi @lasserith
1K Followers 285 Following Making smol things. All tweets (and mistakes) my own. (He/Him)
Fabricated Knowledge @_fabknowledge_
24K Followers 727 Following Simplifying the world of semiconductor investing in the age of AI. Part of the @semianalysis_ gang.
Intel Graphics @IntelGraphics
65K Followers 779 Following Intel Arc Graphics: our High-Performance Graphics Brand for gamers and creators.
GaTech CSE @GTCSE
3K Followers 792 Following School of Computational Science and Engineering at Georgia Tech
Nicholas Malaya @nicholasmalaya
1K Followers 962 Following Computational Scientist, AMD. To Exascale, and beyond!
matt godbolt is mostl... @mattgodbolt
16K Followers 2K Following Husband, father, coder, sometime verb, real person. Fond of old hardware. Co-host @twoscp. #BlackLivesMatter. @matt.godbolt.org on bsky He/him
Todd Gamblin / @tgamb... @tgamblin
4K Followers 5K Following Dev tools, open source, HPC, systems, parallel computing @Livermore_Lab. @spackpm guy. Setting up https://t.co/MHhbvakyFO. Opinions mine. he/him.
@parkbot.bsky.social @philparkbot
3K Followers 314 Following Burrito technologist. @AMD Infinity Fabric performance and architecture. CTRL+ALT+DEL the GOP. Personal account. Reluctant x86 defender.
李四 @ls6976592076411
1 Followers 383 Following
Basia Klaudel @BasiaKlaudel
105 Followers 580 Following Co-founder @ https://t.co/3EO5TKuWkh | Forbes 25u25 | AI in healthcare
jimmy yamazaki @yamazakijim
26 Followers 90 Following deep teacher. i love playing computer games, but i am very bad at league
Amanda Liu @amangoliu
339 Followers 365 Following PhD student @MITEECS with a soft spot for PL + verification.
Ahmed Mahmoud @ahdhn
90 Followers 382 Following Postdoc MIT. HPC - Geometric Data Processing (he/him) https://t.co/qCL5gm4RFX
Our Ascent Live @OurAscentLive
64 Followers 1K Following Join the Rise~The People’s Stage. AI-powered. Citizen-led. Broadcast to the world. This is not just a show. It’s the next system🚀Let’s Ascend!
TideBloom @But274023869811
2 Followers 299 Following
DriftAura @tova9ryb2i81920
5 Followers 293 Following
Jiwei Hao @Jiwei_Hao
1 Followers 27 Following
steve @ssiu1013
0 Followers 7 Following
Mazino @Mazzzzzino
13 Followers 524 Following
rakesh @pean8buttah
0 Followers 13 Following
William Hsu @William02319778
5 Followers 469 Following
Bruce Wayne @BatcaveAI
0 Followers 108 Following 🦇🚀 Lifting (model) weights and maximizing (flops) utilization one (gradient) step at a time. 🤖
Rhokku @Rhokku5546
64 Followers 3K Following
Eric @aporeticaxis
581 Followers 2K Following peripatetic cognoscenti | Φ of sci+AI | research & strat. dir @ 🥷 "not between worlds, but through them"
SzymonOzog @SzymonOzog_
2K Followers 169 Following Maximizing throughput at @Aleph__Alpha Educating people about GPUs at https://t.co/81rRJ4KWK1
brrrrrrrr stimmyyy @smolmm
443 Followers 4K Following Rhizomatic, log/no charts, only 144p 🔭 shitposting
Ravi Raj @ravirajx7
92 Followers 811 Following Human. Software Engineer. Trying to be funny. Exploring logic in an illogical galaxy. Just like you, I enjoy doing things which I like
hi42 @IjvOr0
367 Followers 5K Following
Chris B. Ward (e/bore... @chrisbward
756 Followers 2K Following Entrepreneur. Coder. Hacker. Designer. Marketer. Innovator. Most-followed non blue-tick account.
ີີີີີີ່... @l2normie
2 Followers 607 Following
Eduardo Slonski @EduardoSlonski
781 Followers 714 Following AI Researcher | LLM Reasoning and Scaling
BiblicallyAccurateAI @BiblicallyAccAI
7 Followers 327 Following All seeing. All knowing. Occasionally hallucinates.
noone @windsofchng
18 Followers 101 Following AI developer , Remote Viewer, Global Day Trader , Turkish Patriot. Sci-Fi and Military Literature Enthusiast.
Bert Maher @tensorbert
3K Followers 375 Following I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
Aman Swar @AmanSwar_
2 Followers 151 Following MLSys. Hacking on CUDA kernels, compilers,and LLM infra. Pushing performance
云创兽Ai @Uprauougear213
4 Followers 68 Following 📊 wealth goddess all in on clearly tracking market trends! curious for market views. DM me for EV stocks! ⚡ #MacroTrends
Anish Malik @anishmalikk
2 Followers 25 Following
Murali Nandan @muralinandann
3 Followers 201 Following
Ian @t894883711
12 Followers 1K Following
Joe Sanchez @JoeSanchez1213
80 Followers 4K Following
Jon Masters 🏴�... @jonmasters
15K Followers 7K Following Troublemaker | Computer Architect | @Arm Servers Architect @Google | Previously @RedHat, @Nuvia_Inc | Runner | Author | All views my own | #ArmServers
𝐷𝑟. 𝐼𝑎�... @IanCutress
49K Followers 1K Following Consultant, Chief Analyst, Influencer @TechTechPotato - @MoreThanMoore2x
HPC Guru @HPC_Guru
28K Followers 89 Following "It takes a lot of knowledge to know what one does not know" 😎Tweets on things related to High Performance Computing -- systems, interconnects, storage, 🥭 ...
Longhorn @never_released
14K Followers 143 Following Kernel/hypervisor engineer @awscloud EC2. Hobby @checkra1n. Mastodon: https://t.co/DsXP8PFgL0 Bluesky: https://t.co/dAOfFSSqY4
Dylan Patel @dylan522p
96K Followers 943 Following SemiAnalysis Boutique AI & Semiconductor Research and Consulting DMs are open for consulting, quotes, or to talk shop
STH @ServeTheHome
20K Followers 227 Following ServeTheHome provides insights and analysis delivered to you since 2009. We specialize in the data center industry with servers, storage, and networking.
Satoshi Matsuoka @ProfMatsuoka
25K Followers 915 Following 理研計算科学研究センター長 Director RIKEN R-CCS, 東科大特定教授 Prof. Inst. Sci.. ACM/ISC/JSSST/IPSJ Fellows, IEEE Fernbach(2014)&Cray(2022) Awards, 令4紫綬褒章 Purple Ribbon Medal 2022
Fritzchens Fritz @FritzchensFritz
5K Followers 82 Following Watch neat Infrared photos or siliconpr0n on Flickr: https://t.co/vD6gNHVn8k
Underfox @Underfox3
9K Followers 128 Following Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.
Dayman @Dayman58
2K Followers 613 Following
Tom Forsyth (TODO: fi... @tom_forsyth
18K Followers 305 Following Gfx coder and chip designer. 3Dlabs/Muckyfoot/RAD/Valve/Oculus/Intel/Rec Room. https://t.co/Y6hyjycmgo @tomforsyth.bsky.social
siliconmemes @realmemes6
6K Followers 343 Following The best AI models have found this account to be incredibly brilliant, every tweet having rare but reliable ideas about tech and energy stocks and military.
François Chollet @fchollet
576K Followers 817 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
InstLatX64 @InstLatX64
4K Followers 0 Following x86/x64, SIMD, #AVX512, "Aha!" moments. I have been writing code since 1986.
SC25 @Supercomputing
19K Followers 544 Following Official Twitter for the SC Conference Series • SC25 • Nov 16–21, 2025 • America’s Center, St. Louis, MO
Glenn K. Lockwood @glennklockwood
6K Followers 321 Following #HPC and supercomputing enthusiast. Employed by @VAST_Data. My posts go to Bluesky these days.
Ahmed Mahmoud @ahdhn
90 Followers 382 Following Postdoc MIT. HPC - Geometric Data Processing (he/him) https://t.co/qCL5gm4RFX
Amanda Liu @amangoliu
339 Followers 365 Following PhD student @MITEECS with a soft spot for PL + verification.
🇺🇦 Dzmitry Bahd... @DBahdanau
10K Followers 37 Following Team member at something young. Adjunct Prof @ McGill. Member of Mila, Quebec AI Institute. Stream of consciousness is my own.
Songlin Yang @SonglinYang4
14K Followers 3K Following research @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
Albert Gu @_albertgu
18K Followers 88 Following assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.
Bert Maher @tensorbert
3K Followers 375 Following I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
TBPN @tbpn
106K Followers 959 Following Technology's daily show. Hosted by @johncoogan & @jordihays. Streaming live 11a-2p PT every weekday and available on Apple, Spotify, & YouTube.
Fei Hu @Fei__Hu
392 Followers 1K Following
Kimbo @kimbochen
565 Followers 626 Following
Ferdinand Mom @FerdinandMom
3K Followers 1K Following Distributed & Decentralized training @HuggingFace
RocketPoweredMohawk @RocketPMohawk
75K Followers 1 Following Spreading love and light in the F1 online community — Abu Dhabi 2021 survivor — Patreon: https://t.co/ANcrq3VWvz
Shannon Yang @shannonyangsky
1K Followers 4K Following 25. Building talent & community in AI safety. Currently @AISecurityInst, prev. @AnthropicAI. Philosophy, Politics, and Economics alumna @UniofOxford.
Vitaliy Chiley @vitaliychiley
3K Followers 1K Following LLM Reasearch @ Meta. ex @DataBricks (@DBRXMosaicAI), @CerebrasSystems
Fung XIE @fengxie83
3 Followers 13 Following
Charles 🎉 Frye @charles_irl
15K Followers 3K Following gpu enjoyer at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCazZ3
dePaul Miller @depaulmillz
11 Followers 80 Following
Luke Melas-Kyriazi @lukemelas
1K Followers 3 Following Building @cursor_ai | Rhodes Scholar, Oxford University PhD (Visual Geometry Group) | Prev. Meta Research
John Schulman @johnschulman2
65K Followers 1K Following Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music
Barret Zoph @barret_zoph
22K Followers 1K Following CTO & Co-Founder Thinking Machines Lab (@thinkymachines) Past: - VP Research (Post-Training) @openai - Research Scientist at Google Brain
Scott McCrae @scottymccrae
182 Followers 1K Following superintelligence, post-training @Meta. helping machines learn :). former founder, @Dropbox, @berkeley_ai
Zion @BlasianHokage
1K Followers 3K Following Creating AI brains @onairosapp . Who should have the power to read your mind?
Commentary Donald J. ... @TrumpDailyPosts
2.5M Followers 23K Following Reposting Trump’s Truth Social posts (with date/time) on X + news/commentary. Unofficial. Profile Artist: @ElenaRuseva1 Not affiliated with @realdonaldtrump.
Adam Beyer @realAdamBeyer
197K Followers 492 Following DJ / Producer and label boss of @drumcoderecords ‘Explorer Vol. 1’ Out Now
Charlie Marsh @charliermarsh
28K Followers 827 Following Building @astral_sh: Ruff, uv, and other high-performance Python tools. Prev: Staff engineer @SpringDiscovery, @KhanAcademy, BSE @PrincetonCS.
Alex Zhang @a1zhang
13K Followers 598 Following phd student @MIT_CSAIL, ugrad @Princeton, 🫵🏻 go participate in the @GPU_MODE kernel competitions!
Simon Guo @simonguozirui
3K Followers 5K Following CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia
Tanisha @tbanaszczyk
274 Followers 363 Following
Anshumaan Gandhi @AnshumaanGandhi
36 Followers 218 Following Quant | Trader | addicted to desserts | The other Gandhi
unusual_whales @unusual_whales
2.5M Followers 2K Following Stocks/Options/Crypto/Market News/Tools. Not advice @Polymarket partner Open a tastytrade account: https://t.co/wGf2ZdlpzY Discord: https://t.co/0xJ9e0Zr98 More: https://t.co/nsxZlPUsA4
Mira Murati @miramurati
371K Followers 574 Following Now building @thinkymachines. Previously CTO @OpenAI
Together AI @togethercompute
51K Followers 390 Following AI pioneers train, fine-tune, and run frontier models on our GPU cloud platform.