Zhepei Wei @weizhepei

Ph.D. Student @CS_UVA | Research Intern @Meta. Previously @AmazonScience. Research interest: ML/NLP/LLM. cs.virginia.edu/~tqf5qb/ Charlottesville, VA Joined January 2016

Tweets

88
Followers

192
Following

533
Likes

2K

Rohan Paul @rohanpaul_ai

4 weeks ago

OpenAI realesed new paper. "Why language models hallucinate" Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty. The paper puts this on a statistical footing with simple, test-like incentives that reward confident…

97 345 3K 368K 2K

Download Image

Prophet Arena @ProphetArena

a month ago

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen…

90 148 1K 445K 874

Download Image

Jiaxin Huang @jiaxinhuang0229

2 months ago

Thrilled to share this exciting work, R-Zero, from my student @ChengsongH31219 where LLM learns to reason from Zero human-curated data! The framework includes co-evolution of a "Challenger" to propose difficult tasks and a "Solver" to solve them. Check out more details in the…

ChengSong Huang @ChengsongH31219

2 months ago

3 36 135 14K 95

Download Image

1 4 23 2K 1

ChengSong Huang @ChengsongH31219

2 months ago

🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data ! How to train LLM without data? R-Zero teaches Large Language Models to reason starting with nothing but a base model. No data required!!! Paper: arxiv.org/abs/2508.05004 Code:…

3 36 135 14K 95

Download Image

AK @_akhaliq

2 months ago

R-Zero Self-Evolving Reasoning LLM from Zero Data

13 83 552 63K 483

Download Image

Anthropic @AnthropicAI

2 months ago

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

61 215 2K 586K 1K

Download Image

Scale AI @scale_AI

2 months ago

As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵

7 22 87 31K 18

Quentin Gallouédec @QGallouedec

2 months ago

There will be *no more than 5 days* between the release of GSPO and its implementation in TRL

10 11 322 34K 167

Download Image

Yang Yue @YangYue_THU

2 months ago

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning? Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding. Explore is holy grail for LLM and may entail beyond 0/1 reward.

4 17 125 9K 104

Download Image

Chujie Zheng @ChujieZheng

2 months ago

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

30 249 2K 319K 1K

Download Image

Zhepei Wei @weizhepei

2 months ago

Highlight of my #ICML2025 poster session: “So… did you train your model on the test set?” 😅 Probably the ML community’s new “standard practice” question — sadly necessary, but here we are 🤦‍♂️

0 0 2 234 0

Gautam Kamath @thegautamkamath

5 months ago

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4

5 94 674 83K 804

Download Image

Haolin Liu @HaolinLiu616

3 months ago

🚨 LLM-as-a-Judge in RLVR can be easily hacked, even GPT-4o. Simple sentences can trick top models into false positives, although the task is just to compare a given solution to a reference answer. 📊 What we found: 1️⃣ Figure 1: “:” and “Thought process:” fool nearly all models…

elvis @omarsar0

3 months ago

13 122 698 99K 637

Download Image

0 3 19 2K 1

Download Image

Zhepei Wei @weizhepei

3 months ago

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

0 5 16 1K 1

Yu Meng @yumeng0818

3 months ago

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

0 8 27 2K 1

Download Image

Zengzhi Wang @SinclairWang1

3 months ago

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

10 87 509 91K 477

Download Image

Lex Fridman @lexfridman

4 months ago

Here's my conversation with Terence Tao, one of the greatest mathematicians in history. We talk about the hardest problems in mathematics & physics, and how AI might help us humans to solve them. This conversation was a huge honor for me. I can't quite put it into words, but…

302 739 5K 1.2M 3K

Download Video

Zhepei Wei @weizhepei

4 months ago

Nice work! In our recent paper WebAgent-R1 (arxiv.org/abs/2505.16421), we also observed a similar finding—test-time scaling via increased interactions! Feels like we’re not far from discovering new scaling laws for agents!🤩

Aviral Kumar @aviral_kumar2

4 months ago

1 11 121 11K 79

Download Image

0 0 10 853 0

Download Image

Jiaxin Huang @jiaxinhuang0229

4 months ago

🚀🚀Excited to share our new work on Speculative Decoding by @shrangoh! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!