Gaurav Ghosal @gaurav_ghosal

Ph.D. Student @mldcmu | Former Undergraduate Student @berkeley_eecs and Researcher @berkeley_ai | Joined January 2023

Tweets

83
Followers

248
Following

183
Likes

71

Zitong Yang @ZitongYang0

a week ago

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

9 49 242 34K 192

Download Image

Aditi Raghunathan @AdtRaghunathan

2 weeks ago

I had the chance to join the TWIML podcast to talk about my group’s ICML 2025 papers! We dug into the surprising limitations of modern pre-training: where it breaks down, why it matters, and what new directions might help us move past these barriers.

The TWIML AI Podcast @twimlai

2 weeks ago

0 1 7 9K 9

Download Video

0 3 64 8K 11

Suhas Kotha @kothasuhas

2 weeks ago

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

9 79 435 137K 259

Download Image

Teachable Machine @TeachableAI

2 weeks ago

Researchers are working on ways to prevent large language models (LLMs) from simply memorizing information instead of truly learning. They found that removing memorized parts directly can harm the model's ability to learn new things. Their solution, called MemSinks, creates…

0 1 0 104 0

Sachin Goyal @goyalsachin007

2 weeks ago

I had early sneak peeks into this exciting work on rethinking pretraining—credits to @gaurav_ghosal, my constant buddy through countless late nights at CMU. It’s been a blast building pretraining frameworks and sharing insights. @gaurav_ghosal’s energy is absolutely unmatched!

Aditi Raghunathan @AdtRaghunathan

2 weeks ago

6 35 174 39K 105

0 1 10 1K 0

Pratyush Maini @pratyushmaini

2 weeks ago

One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️

Aditi Raghunathan @AdtRaghunathan

2 weeks ago

6 35 174 39K 105

2 16 185 21K 134

Gaurav Ghosal @gaurav_ghosal

2 weeks ago

This was a very fun project and we are really excited to keep working along this direction!

Aditi Raghunathan @AdtRaghunathan

2 weeks ago

This was a very fun project and we are really excited to keep working along this direction!

6 35 174 39K 105

0 1 7 534 0

Aditi Raghunathan @AdtRaghunathan

2 weeks ago

There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.

6 35 174 39K 105

Sachin Goyal @goyalsachin007

2 weeks ago

🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.

Dylan Sam @dylanjsam

2 weeks ago

7 88 340 47K 231

Download Image

3 13 65 9K 19

Ravid Shwartz Ziv @ziv_ravid

4 weeks ago

The new OpenAI paper “Why Language Models Hallucinate” is more like PR than research. The claim that hallucinations arise because training/evaluation reward guessing over abstaining is decades-old (reject option classifiers, selective prediction).

19 54 676 126K 226

Sachin Goyal @goyalsachin007

3 weeks ago

1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation

5 66 328 31K 233

Download Image

Ziqian Zhong @fjzzq2002

2 months ago

🤖 Some company just released a new set of open-weight LLMs well-suited for your production environment. However, you suspect that the models might be trained with backdoors or other hidden malicious behaviors. Is it still possible to deploy these models worry-free? (1/7)

3 22 50 6K 27

Download Image

Yiding Jiang @yidingjiang

3 months ago

@abitha___ will be presenting our work on training language models to predict further into the future beyond the next token and the benefits this objective brings. x.com/gm8xx8/status/…

𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8

6 months ago

@abitha___ will be presenting our work on training language models to predict further into the future beyond the next token and the benefits this objective brings. x.com/gm8xx8/status/…

10 54 293 24K 205

Download Image

0 5 18 2K 3

Jiayi Geng @JiayiiGeng

3 months ago

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333…