Simon Mo @simon_mo_

@vllm_project Joined July 2018

Tweets

125
Followers

1K
Following

342
Likes

146

Simon Mo @simon_mo_

7 hours ago

🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4? developer.nvidia.com/blog/inside-nv…

Charles 🎉 Frye @charles_irl

8 hours ago

🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4? developer.nvidia.com/blog/inside-nv…

12 29 369 43K 232

Download Image

0 2 7 1K 3

Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌 * Run CPU nodes to download checkpoints and simple dev work with vLLM for testing * Scale out to B200 when ready!

1 2 28 2K 7

Download Image

Michael Goin @mgoin_

3 days ago

Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥

6 7 65 5K 27

Download Image

Simon Mo @simon_mo_

2 months ago

It has been 1+ month of intense work! Now time to get some sleep 😴

Zhuohan Li @zhuohan123

2 months ago

It has been 1+ month of intense work! Now time to get some sleep 😴

5 7 168 15K 24

2 2 30 2K 3

Simon Mo @simon_mo_

2 months ago

I didn't expect the first section "KV-cache hit rate is the single most important metric for a production-stage AI agent" but 🤯

Yichao 'Peak' Ji @peakji

2 months ago

I didn't expect the first section "KV-cache hit rate is the single most important metric for a production-stage AI agent" but 🤯

19 76 430 106K 404

0 0 10 917 2

Simon Mo @simon_mo_

3 months ago

Long time in the making and I'm beyond excited about the future of vLLM!

PyTorch @PyTorch

3 months ago

Long time in the making and I'm beyond excited about the future of vLLM!

4 67 316 21K 79

Download Image

0 0 18 1K 0

OpenAI Developers @OpenAIDevs

5 months ago

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine @vllm_project ⬩OWASP Nettacker - automated network pentesting @iotscan ⬩Pulumi - infrastructure as code in any language @PulumiCorp ⬩Dagster - cloud-native data pipelines @dagster…

36 139 937 95K 262

Simon Mo @simon_mo_

5 months ago

😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

Letta @Letta_AI

5 months ago

😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

11 35 154 57K 98

0 3 7 1K 0

vLLM @vllm_project

6 months ago

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

25 345 2K 203K 765

Simon Mo @simon_mo_

7 months ago

Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

vLLM @vllm_project

7 months ago

Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

2 5 24 6K 3

2 1 18 3K 1

Character.AI @character_ai

8 months ago

it's Catacter AI now 😼

19 13 388 29K 8

Robert Shaw @robertshaw21

8 months ago

Landed my first PR in @vllm_project 1 year ago today (github.com/vllm-project/v…) 38K LOC and 100+ PRs later and we are just getting started

0 3 34 5K 0

Roger Wang @rogerw0108

8 months ago

Robert and I started contributing to vLLM around the same time and today is my turn. Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members! and we're just getting started github.com/vllm-project/v…

Robert Shaw @robertshaw21

8 months ago

0 3 34 5K 0

5 4 52 6K 5

vLLM @vllm_project

8 months ago

We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.