It's a bird. It's a plane. It's gaussian splatting.
When @Framestore needed to bring Kryptonian tech to Earth, they turned to gaussian splatting. In the recently released Superman, every shot of Kal-El's parents is a dynamic splat.
I spoke to the team at Framestore about…
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
announcing the @GPU_MODE x @scaleml summer speaker series happening next week, a 5⃣-day series where top researchers will teach about the algorithmic and systems-level advances that underpin `gpt-oss`!
all content will be live-streamed & recorded for FREE on GPU MODE's YouTube!
for those curious about how 1M context model is even possible here is a 47min deep dive on the minimax-01 open model
in it we cover the lightning linear attention mechanism the hybridization strategy to make it work and how to go beyond and make the model multimodal
wild stuff
Sparsity can make your LoRA fine-tuning go brrr 💨
Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯
🧵1/
z-lab.ai/projects/spars…
We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?
Introducing Log-Linear Attention with:
- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
I think after Attention (learnt back in 2020), Flash-Attention is the only machine learning algorithm that gave me immense happiness once I cracked it. #MachineLearning#Llm
Spent last week building an #LLM completely from scratch — and I mean everything: BPE tokenizer (optimized), Linear, SWIGLU, RoPE attention, and full Transformer blocks, AdamW, cross_entropy, and top-p decoding etc.. and trained on children stories.
github.com/bargav25/llm
7K Followers 221 FollowingIncoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.
15K Followers 528 FollowingAsst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.
46K Followers 1K FollowingAI Developer Experience @GoogleDeepMind | prev: Tech Lead at @huggingface, AWS ML Hero 🤗 Sharing my own views and AI News 🧑🏻💻 https://t.co/7IosdlNz22
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
1K Followers 8K FollowingAI inference, speculative decoding, open source. Built novel decoding algorithms – default in Hugging Face Transformers (150+ ⭐). Making AI faster + cheaper
26K Followers 229 Followinggetting us to singularity with friends
computers can be understood: https://t.co/doHE1Qv2Sj
x @GoogleDeepMind @Microsoft
tensor core maximalist
83K Followers 324 FollowingAll things AI for developers from @NVIDIA.
Additional developer channels: @NVIDIADeveloper, @NVIDIAHPCDev, and @NVIDIAGameDev.
10K Followers 236 FollowingInterpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar