🧠For Qwen3-Next’s Day 0 support in SGLang, one tricky part was enabling spec decoding with the Hybrid Linear Model—since SSM & conv caches only store the last position (unlike KV cache).
🚀After tons of effort with @qingquan_song, we achieved >2× speedup!
Benchmarks below
🧠For Qwen3-Next’s Day 0 support in SGLang, one tricky part was enabling spec decoding with the Hybrid Linear Model—since SSM & conv caches only store the last position (unlike KV cache).
🚀After tons of effort with @qingquan_song, we achieved >2× speedup!
Benchmarks below https://t.co/M4mSNbUDgI
Special thanks to my old friends from the SGLang community especially @hebiao064@qingquan_song and more (sry I don’t know their X account 🥹) who help support the hybrid model MTP. For linear attention, the eviction during eagle verification phase is different from the regular…
Special thanks to my old friends from the SGLang community especially @hebiao064@qingquan_song and more (sry I don’t know their X account 🥹) who help support the hybrid model MTP. For linear attention, the eviction during eagle verification phase is different from the regular…
🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!
🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…
We’re live! 🎉
This is the official account for slime — an open-source, SGLang-native post-training framework for RL scaling.
Kicking things off with our first milestone → v0.1.0 release 🧪
Blog: thudm.github.io/slime/blogs/re…
Follow us to run RL faster ⚡️
🚀 Introducing the first OSS example of fine-tuning gpt-oss with MXFP4 QAT! Powered by NVIDIA ModelOpt + SGLang.
Highlights
1. Fine-tune gpt-oss while keeping the original MXFP4 format
2. Preserve FP4 efficiency and recover accuracy
3. Deploy seamlessly with SGLang!
Full Blog👇
✅ We’re excited to support @Alibaba_Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…
✅ We’re excited to support @Alibaba_Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507!
After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…
Meet Qwen-VLo, your AI creative engine:
• Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals
• On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands
• Global-Ready: Generate image in multiple languages
• Progressive…
We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource…
Huge thanks to @AMD for donating an MI350 to SGLang! This advanced AI accelerator is making a meaningful difference—enabling us to move faster in developing scalable LLM systems and pushing the limits of inference optimization.
Special thank to our awesome infra partner…
🚀 Proud to introduce the Qwen3-Embedding and Qwen3-Reranker Series – setting new standards in multilingual text embedding and relevance ranking!
✨ Highlights:
✅ Available in 0.6B / 4B / 8B versions
✅ Supports 119 languages
✅ State-of-the-Art performance on MMTEB , MTEB ,…
257 Followers 537 FollowingTeacher, PICTS School Admin, Student in MAS Didactics of media and computer science (UZH, PHSZ, HSLU), Techno Optimist, tinkerer, ML enthusiast
8K Followers 180 FollowingLarge Model Systems Organization: Join our Slack: https://t.co/mSPNyKTLTS We developed SGLang https://t.co/jEqIJcGwGA, Chatbot Arena (now @lmarena_ai), and Vicuna!
7K Followers 788 FollowingCo-founder and CEO @TensorChord, building postgres-based vector extension https://t.co/7WGvl1sR56 | Father of 1 cat | Married
19K Followers 1K FollowingAgents @Meta MSL TBD Lab. previously posttraining research @OpenAI train LLMs to do things: deep research, chatgpt agent, etc. CS PhD @LTIatCMU
257 Followers 537 FollowingTeacher, PICTS School Admin, Student in MAS Didactics of media and computer science (UZH, PHSZ, HSLU), Techno Optimist, tinkerer, ML enthusiast
7K Followers 788 FollowingCo-founder and CEO @TensorChord, building postgres-based vector extension https://t.co/7WGvl1sR56 | Father of 1 cat | Married
1K Followers 103 FollowingAI/RL researcher, Assistant Prof. at @Tsinghua_Uni, leading the RL lab at @AntResearch_, PhD at @berkeley_ai, frequent flyer and milk tea lover.
26K Followers 229 Followinggetting us to singularity with friends
computers can be understood: https://t.co/doHE1Qv2Sj
x @GoogleDeepMind @Microsoft
tensor core maximalist
165K Followers 473 FollowingTwitter is my Chain-Of-Thought. Reading history is my end-to-end training. Not financial advice. 一言不合就拉黑。评论区只对订阅用户开放。
Runner: 1 km, 3'49; 5 km, 23'07
397 Followers 23 FollowingBuilt by researchers and engineers from MIT, we are pursuing Artificial Efficient Intelligence (AEI). Try GPT-OSS support: https://t.co/BQfsnXIGFo.