Greg Durrett @gregd_nlp

Associate professor at NYU (Courant CS + Center for Data Science) | advisor for @bespokelabsai | large language models and NLP | he/him Joined December 2017

Tweets

1K
Followers

8K
Following

893
Likes

3K

Ari Holtzman @universeinanegg

20 hours ago

For those who missed it, we just releaaed a little LLM-backed game called HR Simulator™ You play an intern ghostwriting emails for your boss. It’s like you’re stuck in corporate email hell…and you’re the devil 😈 link and an initial answer to “WHY WOULD YOU DO THIS?” below

3 7 31 8K 5

Download Image

Greg Durrett @gregd_nlp

5 days ago

Check out this feature about AstroVisBench, our upcoming NeurIPS D&B paper about code workflows and visualization in the astronomy domain! Great testbed for the interaction of code + VLM reasoning models.

CosmicAI @CosmicAI_Inst

5 days ago

1 4 7 5K 1

Download Video

1 3 20 4K 4

CosmicAI @CosmicAI_Inst

5 days ago

Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy! A new benchmark developed by researchers CosmicAI is testing how well LLMs implement scientific workflows in astronomy and visualize results.

1 4 7 5K 1

Download Video

Chantal @ChantalShaib

6 days ago

"AI slop" seems to be everywhere, but what exactly makes text feel like slop? In our new work (w/ @TuhinChakr, @dgolano, @byron_c_wallace) we provide a systematic attempt at measuring AI slop in text! arxiv.org/abs/2509.19163 🧵 (1/7)

Aidan McLaughlin @aidan_mclau

8 months ago

355 72 965 251K 306

14 35 219 32K 143

Download Image

Liyan Tang @LiyanTang4

2 weeks ago

Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track! Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

Liyan Tang @LiyanTang4

4 months ago

2 31 77 14K 30

Download Image

1 16 33 3K 2

Download Image

Jessy Li @jessyjli

2 weeks ago

To appear #NeurIPS2025: Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring? The @CosmicAI_Inst presents ✨AstroVisBench:

Sebastian Joseph @sebajoed

4 months ago

1 8 23 9K 3

Download Image

0 8 34 3K 6

Li S. Yifei @realliyifei

4 weeks ago

How well can LLMs & deep research systems synthesize long-form answers to *thousands of research queries across diverse domains*? Excited to announce 🎓📖 ResearchQA: a large-scale benchmark to evaluate long-form scholarly question answering at scale across 75 fields, using…

1 22 59 7K 32

Download Image

Mahesh Sathiamoorthy @madiator

a month ago

We are looking to hire an intern who can work on our RL environment curation and data curation stack. Prior experience with post-training is a must, as well as great software engineering skills. DM me or email your resume to hiring at bespokelabs dot ai. In addition, we are…

2 13 121 19K 76

Download Image

Megan Richards @megan_richards_

a month ago

I'm shocked at how poorly this is advertised, so here's a PSA: NSF has a GRFP-like program specifically for computing disciplines called CISE. The program provides the same 3 years of PhD funding PLUS a year-long mentorship program for the application cycle.

1 14 48 8K 30

Yuhan Liu @YuhanLiu_nlp

a month ago

👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction. ✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025

2 21 74 18K 39

Download Image

Amy Pavel @amypavel

a month ago

📣I've joined @berkeleyeecs as an Assistant Professor! My lab will join me soon to continue our research in accessibility, HCI, and supporting communication! I'm so excited to make new connections at @UCBerkeley and in the Bay Area more broadly, so please reach out to chat!

49 20 698 46K 68

Download Image

Dan Jurafsky @jurafsky

a month ago

Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/slp3/

7 70 395 33K 162

Greg Durrett @gregd_nlp

a month ago

Come work with us at Bespoke! It's a fantastic team and a great opportunity to work on the cutting edge of data curation for reasoning!

Alex Dimakis @AlexGDimakis

a month ago

Come work with us at Bespoke! It's a fantastic team and a great opportunity to work on the cutting edge of data curation for reasoning!

6 14 143 236K 85

Download Video

0 3 48 8K 16

Tomer Wolfson @TomerWolfson

a month ago

Many factual QA benchmarks have become saturated, yet factuality still poses a very real issue! ✨We present MoNaCo, an Ai2 benchmark of human-written time-consuming questions that, on average, require 43.3 documents per question!✨ 📣Blogpost: allenai.org/blog/monaco 🧵(1/5)

1 14 41 4K 12

Download Image

Shangbin Feng @shangbinfeng

a month ago

Two caveats with self-alignment: ⚠️ A single model struggles to reliably judge its own generation. ⚠️ A single model struggles to reliably generate diverse responses to learn from. 👉 Introducing Sparta Alignment, where multiple LMs collectively align through ⚔️ combat.

2 10 33 3K 20

Download Image

Mosh Levy @mosh_levy

2 months ago

Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latest research suggests that we do not. This highlights a new angle on the "Are they transparent?" debate: they might be, but we misinterpret them. 🧵

8 27 135 27K 99

Download Image

Jessy Li @jessyjli

2 months ago

The Echoes in AI paper showed quite the opposite with also a story continuation setup. Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.

Ethan Mollick @emollick

2 months ago

47 132 872 74K 432

Download Image

2 14 37 6K 21

Download Image

Rui Zhang @ruizhang_nlp

2 months ago

📢 Call for Papers: NewSumm 2025 - The 5th New Frontiers in Summarization Workshop at EMNLP 2025 The summarization research community is invited to submit to NewSumm 2025, co-located with EMNLP 2025! As LLMs continue to transform our field, we're expanding beyond traditional…