Mo Beigi @mbeigi_cs

CS PhD Student @ucdavis mbeigics.github.io California Joined June 2025

Tweets

57
Followers

26
Following

287
Likes

163

Rohan Paul @rohanpaul_ai

7 days ago

The paper tackles LLM flattery and fixes it by training the model’s thinking steps, not just its final answer. Sycophancy here means the model agrees with a wrong claim or backs off a correct one. They show by optimizing the reasoning path, with uncertainty based progress…

1 6 21 4K 18

Download Image

Ming Jin (hiring Fall'26 PhDs) @MingJin80233626

a week ago

📢 Announcing our talk for this Friday (9/26)! 📢 We are thrilled to host Dr. Alane Suhr @alsuhr from UC Berkeley! She'll be presenting her exciting work on: "Training Language-Based Agents" Join us to learn about training embodied agents that learn from human interaction. 🗓️…

0 2 7 178 0

Ali Eslami @arkitus

a week ago

This is AGI complete

14 46 439 37K 201

Download Image

Mor Geva @megamor2

2 months ago

@soheeyang_ @GoogleDeepMind @KassnerNora @elenagri_ @riedelcastro @dhgottesman @DanaRamati @GurYoav @IdoC0hen @RGiryes 📍2025-07-30 9:00 - 10:30 Room 1.85 @AmitElhelo will introduce MAPS, a general framework for inferring the functionality of attention heads in LLMs directly from their parameters. x.com/megamor2/statu…

Mor Geva @megamor2

10 months ago

5 57 300 25K 259

Download Image

1 2 9 1K 3

Yiping Lu @2prime_PKU

2 months ago

Anyone knows adam?

270 462 5K 619K 504

Download Image

Jacob Andreas @jacobandreas

2 months ago

👉 New preprint! Today, many the biggest challenges in LM post-training aren't just about correctness, but rather consistency & coherence across interactions. This paper tackles some of these issues by optimizing reasoning LMs for calibration rather than accuracy...

Mehul Damani @MehulDamani2

2 months ago

13 268 896 95K 611

Download Image

2 11 105 14K 53

Tu Vu @tuvllms

4 months ago

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s…

4 40 147 17K 67

Download Image

CLS @ChengleiSi

3 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

11 185 626 146K 220

Download Image

Heng Ji @hengjinlp

3 months ago

I’m looking for a new postdoc to start this fall working on AI for Science/Science-Inspired AI (focusing on chemistry and bioengineering domains for now). Please drop me a CV if interested.

1 17 68 9K 7

Rohan Paul @rohanpaul_ai

3 months ago

The paper claims coding benchmarks high scores of LLMs may come from memorizing past GitHub issues, not real reasoning.😯 The authors build a tiny test: given only the text of an issue, guess the file path that needs fixing. Models hit up to 76% accuracy on the benchmark set,…

14 31 172 16K 93

Download Image

Rohan Paul @rohanpaul_ai

3 months ago

LLM reasoning with reinforcement learning focuses on limited domains, hindering general applicability. This paper develops GURU, a 92,000-example multi-domain dataset, to enable broader reinforcement learning-based reasoning. Methods 🔧: - GURU includes Math, Code, Science,…

2 10 40 7K 29

Download Image

Rohan Paul @rohanpaul_ai

3 months ago

Large language models exhibit grokking, where generalization improves significantly long after training loss converges. This paper identifies grokking in large-scale LLM pretraining and provides internal metrics to monitor this delayed generalization without external validation.…

1 5 17 4K 12

Download Image

Gagan Jain @gaganjain1582

3 months ago

Neat work by my awesome colleagues

Harman Singh @Harman26Singh

3 months ago

Neat work by my awesome colleagues

4 32 123 27K 82

Download Image

0 3 9 1K 0

Parshin Shojaee @ParshinShojaee

3 months ago

A bit late but happy to share that LLM-SRBench, our new benchmark targeting memorization issue in LLMs for scientific discovery is selected for *Oral* presentation at #ICML2025 ! Great to see the community recognizing the importance of this direction. Checkout the camera-ready…

Parshin Shojaee @ParshinShojaee

6 months ago

4 32 202 37K 118

Download Image

0 4 34 2K 3

Ekdeep Singh @EkdeepL

3 months ago

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

9 64 350 64K 348

Download Gif

Rohan Paul @rohanpaul_ai

4 months ago

This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI (“International…

100 313 2K 459K 2K

Download Image

Rohan Paul @rohanpaul_ai

3 months ago

This study shows the same models break down on Olympiad problems and cannot even flag their own faulty proofs. Showed that frontier LLM handle fewer than 4 % of Olympiad proofs correctly and misjudge their own flawed reasoning. Current math benchmarks mark a right answer and…

7 7 22 4K 18

Download Image

Charles Arnal @arnal_charles

3 months ago

❓How to balance negative and positive rewards in off-policy RL❓ In Asymmetric REINFORCE for off-Policy RL, we show that giving less weight to negative rewards is enough to stabilize off-policy RL training for LLMs! 💪 (1/8) Paper: arxiv.org/abs/2506.20520

2 28 156 16K 130

Download Image

Ahmet Kaya @ai_ahmetkaya

3 months ago

We’re hiring @Apple! 📢 Looking for a Computer Vision / ML Engineer to join our team. Work on cutting-edge AI at scale. lnkd.in/gkE_varC Contact me if interested! #ML #ComputerVision #AppleJobs #hiring #hiringNow

1 1 2 341 0

Alex Dimakis @AlexGDimakis

3 months ago

Exciting new RL tooling: A modular library for RL training by the Berkeley NovaSky team. While standard RL training is all done in one loop, it is more efficient for modern post-training to separate the generation of the rollouts from the trainer. It also enables asynchronous…