Sam Bowman @sleepinyourhat

AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan. cims.nyu.edu/~sbowman/ San Francisco (mostly) Joined July 2011

Tweets

2K
Followers

35K
Following

3K
Likes

70K

david rein @idavidrein

21 hours ago

I very distinctly remember while I was in the thick of it making GPQA telling @rgblong that “I knew the project was going to be ambitious/hard, but I didn’t appreciate what that actually meant” In retrospect I probably still would’ve done it, but we basically had to restart the…

Patrick Collison @patrickc

2 months ago

182 748 6K 1.2M 2K

1 2 23 4K 3

Owain Evans @OwainEvans_UK

7 days ago

Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up: alignment-w2024.notion.site

1 50 193 20K 204

Download Image

Sam Bowman @sleepinyourhat

7 days ago

🤖🥇🤖

Arjun Panickssery is in London @panickssery

a week ago

🤖🥇🤖

8 46 319 62K 221

Download Image

1 3 68 10K 18

David Krueger @DavidSKrueger

a week ago

I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...

5 142 402 124K 287

Download Image

Sasha Rush @srush_nlp

3 weeks ago

I like to think of myself as a researcher, but almost certainly the most valuable use of my time is writing US Visa letters.

11 13 575 80K 54

Cem Anil @cem__anil

3 weeks ago

One of our most crisp findings was that in-context learning usually follows simple power laws as a function of number of demonstrations. We were surprised we didn’t find this stated explicitly in the literature. Soliciting pointers: have we missed anything?

Anthropic @AnthropicAI

3 weeks ago

1 7 112 23K 18

Download Image

7 6 69 13K 29

Ethan Perez @EthanJPerez

3 weeks ago

This is the most effective, reliable, and hard to train away jailbreak I know of. It's also principled (based on in-context learning) and predictably gets worse with model scale and context length.

Anthropic @AnthropicAI

3 weeks ago

This is the most effective, reliable, and hard to train away jailbreak I know of. It's also principled (based on in-context learning) and predictably gets worse with model scale and context length.

83 350 2K 498K 871

Download Image

2 10 143 16K 41

Sam Bowman @sleepinyourhat

3 weeks ago

Interesting and concerning new results from @cem__anil et al.: Many-shot prompting for harmful behavior gets predictably more effective at overcoming safety training with more examples, following a power law.

Anthropic @AnthropicAI

3 weeks ago

11 37 299 99K 80

Download Image

0 9 76 11K 17

Chris Olah @ch402

4 weeks ago

I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team! I've been a huge fan of @GoogleColab for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.

Craig Citro @craigcitro

4 weeks ago

6 4 107 44K 11

3 7 130 23K 15

Ethan Perez @EthanJPerez

4 weeks ago

I'll be a research supervisor for MATS this summer. If you're keen to collaborate with me on alignment research, I'd highly recommend filling out the short app (deadline today)! Past projects have led to some of my papers on debate, chain of thought faithfulness, and sycophancy

Ryan Kidd @ryan_kidd44

2 months ago

2 15 66 26K 40

3 6 65 19K 15

Jason D. Clinton @JasonDClinton

4 weeks ago

x.com/i/article/1772…

0 15 60 9K 20

Rohin Shah @rohinmshah

a month ago

Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains. x.com/tshevl/status/…

Toby Shevlane @tshevl

a month ago

7 44 227 53K 123

Download Image

0 12 88 8K 20

Jesse Mu @jayelmnop

a month ago

We’re hiring for the adversarial robustness team @AnthropicAI! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

4 71 462 66K 313

Download Image

Anthropic @AnthropicAI

a month ago

Today we're releasing Claude 3 Haiku, the fastest and most affordable model in its intelligence class. Haiku is now available in the API and on claude.ai for Claude Pro subscribers.

148 393 2K 457K 397

Download Video

Sam Bowman @sleepinyourhat

a month ago

🚨📄 Following up on "LMs Don't Always Say What They Think", @milesaturpin et al. now have an intervention that dramatically reduces the problem! 📄🚨 It's not a perfect solution, but it's a simple method with few assumptions and it generalizes *much* better than I'd expected.

Miles Turpin @milesaturpin

a month ago

5 55 259 49K 218

Download Image

2 8 72 10K 45

Neel Nanda @NeelNanda5

2 months ago

Really great post on how to think about doing mech interp research, and how it requires a very different mindset to normal ML

Adam Jermyn @AdamSJermyn

2 months ago

Really great post on how to think about doing mech interp research, and how it requires a very different mindset to normal ML

4 23 142 61K 139

0 5 76 10K 55

Amanda Askell @AmandaAskell

2 months ago

I suppose this is a good time to mention that I'm looking for a research prompt engineer, in case you want to be my promptégé. (Look, you may wildly out-prompt me but I couldn't resist that portmanteau.) jobs.lever.co/Anthropic/a2c8…

23 34 418 43K 193

Jack Clark @jackclarkSF

2 months ago

Want to work at the frontier of AI policy with the most technical policy team in the business? You do? Excellent. Please consider applying - Special Projects Lead jobs.lever.co/Anthropic/5752… - Policy Analyst, Product jobs.lever.co/Anthropic/6ecd… - Outreach Lead jobs.lever.co/Anthropic/df58…

4 26 110 27K 45

Helen Toner @hlntnr

2 months ago

5 years! It's been unbelievable to see how CSET's team and reputation has grown. To celebrate, here are 5 papers/products, 1 from each year of CSET's existence, that I love (and that exemplify the work we do).