🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4?
developer.nvidia.com/blog/inside-nv…
🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4?
developer.nvidia.com/blog/inside-nv…
Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌
* Run CPU nodes to download checkpoints and simple dev work with vLLM for testing
* Scale out to B200 when ready!
Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs
For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥
Announcing the first Codex open source fund grant recipients:
⬩vLLM - inference serving engine @vllm_project
⬩OWASP Nettacker - automated network pentesting @iotscan
⬩Pulumi - infrastructure as code in any language @PulumiCorp
⬩Dagster - cloud-native data pipelines @dagster…
🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit!
github.com/deepseek-ai/op…
Robert and I started contributing to vLLM around the same time and today is my turn.
Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members!
and we're just getting started
github.com/vllm-project/v…
Robert and I started contributing to vLLM around the same time and today is my turn.
Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members!
and we're just getting started
github.com/vllm-project/v…
We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.
168 Followers 4K FollowingJunior@Nankai University | Major in CS | Research in GenAI & Infra | Full Stack Developer | Beginner in Crypto | Runner, Cyclist, Gym-goer | Rap enthusiast
543K Followers 23K FollowingThe best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
100 Followers 552 FollowingA researcher who has 100+ ICML/NeurIPS/ICLR papers, lose job after taken photos with fans at ICML 2025
Advising high school students to write papers.
274 Followers 2K FollowingCreator of Konveyor Move2Kube. I build Cloud Native and AI systems to solve real world problems.
I manage the Hybrid Cloud Platforms team in IBM Research India
30 Followers 570 FollowingML engineer at Spike Technologies. Previously, a machine learning master's student at @Cambridge_Uni. Excited about opportunities in deep-learning and LLMs.
4.4M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
228 Followers 225 FollowingCS PhD Student working on diverse AI generation advised by @profjoeyg and Sanjit Seshia. formerly @aiatmeta, @nuro, @columbia
3K Followers 857 FollowingFounding Team Thinking Machines. Previously @c.ai. I like zsh aliases, audiobooks, and running. This does not reflect the opinions of my future self.
94K Followers 207 FollowingLMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / @lmsysorg. We’re hiring: https://t.co/1OkfLq2n0I
6K Followers 1K FollowingCo-founder @allhands_ai, building OpenHands | PhD candidate @IllinoisCDS | BS @UMichCSE ('22) | Ex Intern @GoogleAI @Microsoft | Opinions are my own