Junrong Lin @OcssLin

MTS @Alibaba_Qwen on MLsys, building SGLang @lmsysorg | Prev. @DukeU junronglin.com California Joined April 2022

Tweets

99
Followers

102
Following

312
Likes

347

Biao He @hebiao064

3 weeks ago

🧠For Qwen3-Next’s Day 0 support in SGLang, one tricky part was enabling spec decoding with the Hybrid Linear Model—since SSM & conv caches only store the last position (unlike KV cache). 🚀After tons of effort with @qingquan_song, we achieved >2× speedup! Benchmarks below

Qwen @Alibaba_Qwen

3 weeks ago

172 723 4K 905K 2K

Download Image

6 13 74 16K 19

Download Image

Junrong Lin @OcssLin

3 weeks ago

Special thanks to my old friends from the SGLang community especially @hebiao064 @qingquan_song and more (sry I don’t know their X account 🥹) who help support the hybrid model MTP. For linear attention, the eviction during eagle verification phase is different from the regular…

LMSYS Org @lmsysorg

3 weeks ago

0 16 80 41K 10

1 7 19 6K 2

Qwen @Alibaba_Qwen

3 weeks ago

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…

172 723 4K 905K 2K

Download Image

Junyang Lin @JustinLin610

3 weeks ago

github.com/huggingface/tr…

34 49 588 285K 68

Download Image

slime @slime_framework

3 weeks ago

We’re live! 🎉 This is the official account for slime — an open-source, SGLang-native post-training framework for RL scaling. Kicking things off with our first milestone → v0.1.0 release 🧪 Blog: thudm.github.io/slime/blogs/re… Follow us to run RL faster ⚡️

16 8 29 3K 5

Junrong Lin @OcssLin

3 weeks ago

cool

Elon Musk @elonmusk

4 weeks ago

cool

5K 5K 28K 6.6M 2K

0 0 1 63 0

Junrong Lin @OcssLin

4 weeks ago

😎Grad to see my first participated project at Qwen is finally out. More awesome work is coming

Qwen @Alibaba_Qwen

4 weeks ago

😎Grad to see my first participated project at Qwen is finally out. More awesome work is coming

255 548 4K 725K 745

Download Image

0 0 6 192 0

SangBin Cho @Saaaang94

4 weeks ago

We are using SGLang at really large scale RL, and it’s been working great :)

Casper Hansen @casper_hansen_

a month ago

We are using SGLang at really large scale RL, and it’s been working great :)

11 33 557 200K 176

Download Image

8 31 501 53K 131

Huizi Mao @huizi_mao

2 months ago

More on QAT: 1. QAT explanation: pytorch.org/blog/quantizat… 2. MXFP4 QAT is supported in NVIDIA ModelOpt: github.com/NVIDIA/TensorR… 3. A quick drawing of how gpt-oss is trained in my understanding:

1 3 5 1K 2

Download Image

LMSYS Org @lmsysorg

a month ago

🚀 Introducing the first OSS example of fine-tuning gpt-oss with MXFP4 QAT! Powered by NVIDIA ModelOpt + SGLang. Highlights 1. Fine-tune gpt-oss while keeping the original MXFP4 format 2. Preserve FP4 efficiency and recover accuracy 3. Deploy seamlessly with SGLang! Full Blog👇

4 10 57 14K 24

Download Image

Junrong Lin @OcssLin

a month ago

chad

Elon Musk @elonmusk

2 months ago

chad

8K 8K 56K 31.7M 5K

0 0 0 19 0

Junrong Lin @OcssLin

a month ago

👀

Junyang Lin @JustinLin610

a month ago

👀

7 2 81 7K 1

0 0 0 61 0

LMSYS Org @lmsysorg

2 months ago

✅ We’re excited to support @Alibaba_Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…

Qwen @Alibaba_Qwen

2 months ago

332 2K 9K 2.1M 4K

Download Image

1 7 59 6K 16

Qwen @Alibaba_Qwen

2 months ago

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

212 578 4K 987K 831

Download Image

Lianmin Zheng @lm_zheng

3 months ago

Grok4 🚀 x.com/i/events/19427…

68 36 487 85K 37

Junrong Lin @OcssLin

3 months ago

salute

zhyncs @zhyncs42

3 months ago

salute

8 14 176 22K 43

Download Image

0 0 1 39 0

Qwen @Alibaba_Qwen

3 months ago

Meet Qwen-VLo, your AI creative engine: • Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals • On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands • Global-Ready: Generate image in multiple languages • Progressive…

57 267 1K 148K 551

Download Video

LMSYS Org @lmsysorg

3 months ago

We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource…

4 15 88 16K 49

Download Image

LMSYS Org @lmsysorg

4 months ago

Huge thanks to @AMD for donating an MI350 to SGLang! This advanced AI accelerator is making a meaningful difference—enabling us to move faster in developing scalable LLM systems and pushing the limits of inference optimization. Special thank to our awesome infra partner…

4 10 166 44K 18

Download Image

Qwen @Alibaba_Qwen

4 months ago

🚀 Proud to introduce the Qwen3-Embedding and Qwen3-Reranker Series – setting new standards in multilingual text embedding and relevance ranking! ✨ Highlights: ✅ Available in 0.6B / 4B / 8B versions ✅ Supports 119 languages ✅ State-of-the-Art performance on MMTEB , MTEB ,…