EmbeddedLLM @EmbeddedLLM

Your open-source AI ally. We specialize in integrating LLM into your business. Joined October 2023

Tweets

388
Followers

886
Following

1K
Likes

356

vLLM @vllm_project

19 hours ago

Getting ready to try DeepSeek-V3.2-Exp from @deepseek_ai ? vLLM is here to help! We have verified that it works on H200 machines, and many other hardwares thanks to the hardware plugin mechanism. Check out the recipes docs.vllm.ai/projects/recip… for more details 😍 Note: currently…

vLLM @vllm_project

22 hours ago

10 90 603 72K 354

Download Image

0 13 128 10K 34

Download Image

vLLM @vllm_project

22 hours ago

How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

DeepSeek @deepseek_ai

23 hours ago

206 816 6K 962K 978

10 90 603 72K 354

Download Image

vLLM @vllm_project

2 days ago

🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on…

merve @mervenoyann

2 months ago

55 376 3K 281K 4K

Download Image

17 89 686 63K 518

Download Image

Red Hat AI @RedHat_AI

4 days ago

Missed our latest vLLM office hours? We covered hybrid models as first-class citizens in @vllm_project. ✅ Hybrid model support in v1 ✅ Mamba, Mamba2, linear attention ✅ Performance from v0 → v1 ▶️ Recording: youtube.com/live/uWQ489ONv… 📑 Slides: docs.google.com/presentation/d…

1 9 10 2K 4

Hao Zhang @haozhangml

6 days ago

We keep pushing the limits of speculative decoding (SD) in LLM inference -- check out our latest NeurIPS’25 paper: Lookahead Reasoning (LR). The high-level rationale is pretty simple: SD alone isn’t enough now: as GPUs get stronger (H200 -> B200 -> Rubin CPX), we'll be able to…

Hao AI Lab @haoailab

7 days ago

3 24 137 41K 76

Download Video

7 41 345 35K 228

Roger Wang @rogerw0108

6 days ago

Day-0 support on one of the most anticipated model releases🚀🚀🚀 Detailed deployment guide coming soon at vllm-recipes docs.vllm.ai/projects/recip…

Qwen @Alibaba_Qwen

7 days ago

Day-0 support on one of the most anticipated model releases🚀🚀🚀 Detailed deployment guide coming soon at vllm-recipes docs.vllm.ai/projects/recip… https://t.co/dwXiEVZHEM

82 315 2K 315K 561

Download Image

0 9 51 7K 14

Download Image

vLLM @vllm_project

2 weeks ago

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

12 145 1K 209K 373

Download Image

PyTorch @PyTorch

3 weeks ago

Disaggregated Inference at Scale with #PyTorch & #vLLM: Meta’s vLLM disagg implementation improves inference efficiency in latency & throughput vs its internal stack, with optimizations now being upstreamed to the vLLM community. 🔗 hubs.la/Q03J87tS0

7 31 207 39K 101

Download Image

Elon Musk @elonmusk

3 weeks ago

@zephyr_z9 😂 Although AMD is now working pretty well for small to medium sized models

121 209 2K 582K 202

vLLM @vllm_project

3 weeks ago

Welcome Qwen3-Next! You can run it efficiently on vLLM with accelerated kernels and native memory management for hybrid models. blog.vllm.ai/2025/09/11/qwe…

Qwen @Alibaba_Qwen

3 weeks ago

Welcome Qwen3-Next! You can run it efficiently on vLLM with accelerated kernels and native memory management for hybrid models. blog.vllm.ai/2025/09/11/qwe… https://t.co/M19eWoXAj2

172 723 4K 905K 2K

Download Image

10 44 315 60K 53

Download Image

vLLM @vllm_project

3 weeks ago

Deep dive into optimizing weight transfer step by step and improving it 60x!

Lequn Chen @abcdabcd987

3 weeks ago

Deep dive into optimizing weight transfer step by step and improving it 60x!

8 134 513 63K 300

Download Image

0 23 248 18K 139

vLLM @vllm_project

3 weeks ago

⚡️ Efficient weight updates for RL at trillion-parameter scale 💡 Best practice from Kimi @Kimi_Moonshot vLLM is proud to collaborate with checkpoint-engine: • Broadcast weight sync for 1T params in ~20s across 1000s of GPUs • Dynamic P2P updates for elastic clusters •…

Kimi.ai @Kimi_Moonshot

3 weeks ago

65 271 2K 346K 987

Download Image

6 42 330 23K 123

EmbeddedLLM @EmbeddedLLM

3 weeks ago

vLLM Singapore Meetup — Highlights Thanks to everyone who joined! Check out the slides by @vllm_project DarkLight1337 with tjtanaa / @EmbeddedLLM V1 is here: faster startup, stronger CI & perf checks. Scaling MoE: clear Expert Parallelism (EP) setup for single/multi-node +…

1 6 25 3K 5

Download Image

vLLM @vllm_project

4 weeks ago

vLLM is proud to support the great Kimi update from @Kimi_Moonshot , better tool-calling, longer context, and more! Check the deployment guide at huggingface.co/moonshotai/Kim… 🔥

Kimi.ai @Kimi_Moonshot

4 weeks ago

vLLM is proud to support the great Kimi update from @Kimi_Moonshot , better tool-calling, longer context, and more! Check the deployment guide at huggingface.co/moonshotai/Kim… 🔥 https://t.co/ysxqSJVTBU

151 384 3K 602K 673

Download Image

2 24 208 17K 33

Download Image

vLLM @vllm_project

4 weeks ago

Amazing blogpost from @gordic_aleksa explaining internals of vLLM😍

Aleksa Gordić (水平问题) @gordic_aleksa

4 weeks ago

Amazing blogpost from @gordic_aleksa explaining internals of vLLM😍

62 404 3K 311K 3K

Download Image

3 48 382 28K 204

Junchen Jiang @JunchenJiang

a month ago

Go LMCache 🚀

EmbeddedLLM @EmbeddedLLM

a month ago

Go LMCache 🚀

0 3 11 625 3

Download Image

1 1 10 425 0

WEKA @WekaIO

a month ago

Big energy at the @vllm_project Meetup in Singapore! WEKA’s Ronald Pereira shared how NeuralMesh Axon + Augmented Memory Grid can boost vLLM inferencing. Shoutout to @EmbeddedLLM + @AMD for the strong collaboration.

0 5 8 273 0

Download Image

vLLM @vllm_project

a month ago

🚀 Exciting news: DeepSeek-V3.1 from @deepseek_ai now runs on vLLM! 🧠 Seamlessly toggle Think / Non-Think mode per request ⚡ Powered by vLLM’s efficient serving — scale to multi-GPU with ease 🛠️ Perfect for agents, tools, and fast reasoning workloads 👉 Guide & examples:…

DeepSeek @deepseek_ai

a month ago

529 2K 15K 2.0M 3K

6 37 322 23K 84

Download Image

vLLM @vllm_project

a month ago

Great example from @skypilot_org showing how to use vLLM to serve Kimi K2 from @Kimi_Moonshot 😀

SkyPilot @skypilot_org

a month ago

Great example from @skypilot_org showing how to use vLLM to serve Kimi K2 from @Kimi_Moonshot 😀

1 1 4 1K 0

Download Image

0 6 10 1K 2

vLLM @vllm_project

a month ago

🚀 GLM-4.5 meets vLLM @Zai_org 's latest GLM-4.5 & GLM-4.5V models bring hybrid reasoning, coding & intelligent agent capabilities—now fully supported in vLLM for fast, efficient inference on NVIDIA Blackwell & Hopper GPUs! Read more 👉 blog.vllm.ai/2025/08/19/glm…