vLLM is proud to support the great Kimi update from @Kimi_Moonshot , better tool-calling, longer context, and more!
Check the deployment guide at huggingface.co/moonshotai/Kim… 🔥
vLLM is proud to support the great Kimi update from @Kimi_Moonshot , better tool-calling, longer context, and more!
Check the deployment guide at huggingface.co/moonshotai/Kim… 🔥 https://t.co/ysxqSJVTBU
Big energy at the @vllm_project Meetup in Singapore!
WEKA’s Ronald Pereira shared how NeuralMesh Axon + Augmented Memory Grid can boost vLLM inferencing.
Shoutout to @EmbeddedLLM + @AMD for the strong collaboration.
🚀 Exciting news: DeepSeek-V3.1 from @deepseek_ai now runs on vLLM!
🧠 Seamlessly toggle Think / Non-Think mode per request
⚡ Powered by vLLM’s efficient serving — scale to multi-GPU with ease
🛠️ Perfect for agents, tools, and fast reasoning workloads
👉 Guide & examples:…
🚀 Exciting news: DeepSeek-V3.1 from @deepseek_ai now runs on vLLM!
🧠 Seamlessly toggle Think / Non-Think mode per request
⚡ Powered by vLLM’s efficient serving — scale to multi-GPU with ease
🛠️ Perfect for agents, tools, and fast reasoning workloads
👉 Guide & examples:… https://t.co/AiiFbGH8Go
Meet TokenVisor from @EmbeddedLLM: a first of its kind open-source command center for the AMD Instinct neocloud ecosystem.
✅ No proprietary walls
✅ Custom pricing + usage control
✅ Built with AMD, for real AI flexibility
embeddedllm.com/newsroom
🇸🇬 Join us for the vLLM Meetup in Singapore on Aug 27, 2025 (Wednesday), 6-8:30 PM @SGInnovateWe will discuss efficient LLM inference with talks on:
- @EmbeddedLLM on Latest vLLM advancements
- @AMD on optimizing inference on Data Center GPUs
- @WekaIO presenting vLLM + LMCache +…
👀 we care a lot about correctness, ran many evals and stared at many tensors to compare them. numerics of vLLM on hopper should be solid and verified! if you run into any correctness issue on vLLM, we would love to know and debug them!
👀 we care a lot about correctness, ran many evals and stared at many tensors to compare them. numerics of vLLM on hopper should be solid and verified! if you run into any correctness issue on vLLM, we would love to know and debug them! https://t.co/mghuBdbK9f
Thank you @OpenAI for open-sourcing these great models! 🙌
We’re proud to be the official launch partner for gpt-oss (20B & 120B) – now supported in vLLM 🎉
⚡ MXFP4 quant = fast & efficient
🌀 Hybrid attention (sliding + full)
🤖 Strong agentic abilities
🚀 Easy deployment
👉🏻…
Thank you @OpenAI for open-sourcing these great models! 🙌
We’re proud to be the official launch partner for gpt-oss (20B & 120B) – now supported in vLLM 🎉
⚡ MXFP4 quant = fast & efficient
🌀 Hybrid attention (sliding + full)
🤖 Strong agentic abilities
🚀 Easy deployment
👉🏻…
.@vllm_project V1 is now fully optimized for AMD Instinct™ MI300X GPUs?! 🚀🚀🚀
This major architectural upgrade to the popular LLM inference engine brings significant performance gains to AMD hardware.
Dive into the details in the thread below...
The future of optimization and GPU kernel writing, proud of the team!!! Generating Efficient AI-centric Kernels (GEAK) is our OpenAI Triton Kernel agent using inference-time scaling to generate a high performance numerically accurate kernel and two Triton evaluation benchmarks.…
🔥 Step 3 support just landed in vLLM!
Both the 321B text-only and vision-language models are now fully supported ✌️
Step 3 is a blazing-fast, cost-effective VLM with MFA & AFD for efficient inference—even on modest GPUs.
🚄 Up to 4,039 tok/sec/GPU!
We will optimize it further…
🔥 Step 3 support just landed in vLLM!
Both the 321B text-only and vision-language models are now fully supported ✌️
Step 3 is a blazing-fast, cost-effective VLM with MFA & AFD for efficient inference—even on modest GPUs.
🚄 Up to 4,039 tok/sec/GPU!
We will optimize it further…
You might know LMCache Lab for our KV cache optimizations that make LLM prefilling a breeze. But that’s not all! We’re now focused on speeding up decoding too—so your LLM agents can generate new content even faster. In other words: you can save on your LLM serving bills by…
AMD teams contributing to the llama.cpp codebase. Great support from the community with the review process. Exciting to see this open-source collaboration!
AMD teams contributing to the llama.cpp codebase. Great support from the community with the review process. Exciting to see this open-source collaboration!
Intern-S1 is supported in vLLM now, thanks for the joint efforts between the vLLM team and the InternLM team @intern_lm ♥️
The easy way:
uv pip install vllm --extra-index-url wheels.vllm.ai/nightly
vllm serve internlm/Intern-S1 --tensor-parallel-size 8 --trust-remote-code
Intern-S1 is supported in vLLM now, thanks for the joint efforts between the vLLM team and the InternLM team @intern_lm ♥️
The easy way:
uv pip install vllm --extra-index-url wheels.vllm.ai/nightly
vllm serve internlm/Intern-S1 --tensor-parallel-size 8 --trust-remote-code https://t.co/Aq1AfE8qjA
3K Followers 5K FollowingHumanist technologist and AI optimist. Currently CTO at @welcomeaccount_. Building for an inclusive economy through #AI, #MachineLearning, and #Tech4Good
296 Followers 2K FollowingHacker in a hoodie with strong hands.
Crypto native, class of 2017.
Mostly disinformation and conspiracy theories.
DYOR. Break through the bs.
2K Followers 2K FollowingNeuralMesh™ by WEKA® - The world's only storage system purpose-built for AI. Accelerate performance, deploy anywhere, grow stronger with scale.
541 Followers 2K FollowingSenior Principal Architect @RedHat | Exploring the many facets of modern software engineering (...and every once in a while: sports). Build better software.
373 Followers 896 FollowingHusband and proud father of twin boys, Manager at AMD. Focusing on GPUs and Radeon Software. #ProudAMDer Opinions are my own
1K Followers 4K FollowingSerial Entrepreneur Eternal Wife 1 and CEO Of All Elon Musk Companies Forever Owner Of Tesla, X, and Indigenous Nation Engineering Physics Major
272 Followers 341 FollowingUltrasonic medical equipment, then decades of PC based digital entertainment: sound cards, GPUs, the All-In-Wonder TV tuner series, HDTV design, HDR, VR, next??
22K Followers 7K FollowingThe first person in his village to have a Twitter account, my timeline reflects the collective voice and views of my fellow villagers . O.P.F.C
182 Followers 762 FollowingI am constantly learning new knowledge in the field of machine learning, AI, and LLM software. I will be happy to answer any of your questions in this area.
24K Followers 1 Followingcovering the latest AI & LLM research /// see "highlights" for all previous weekly threads /// building the best AI paper search engine @findmypapersai