AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.cims.nyu.edu/~sbowman/ San Francisco (mostly)Joined July 2011
I very distinctly remember while I was in the thick of it making GPQA telling @rgblong that “I knew the project was going to be ambitious/hard, but I didn’t appreciate what that actually meant”
In retrospect I probably still would’ve done it, but we basically had to restart the…
I very distinctly remember while I was in the thick of it making GPQA telling @rgblong that “I knew the project was going to be ambitious/hard, but I didn’t appreciate what that actually meant”
In retrospect I probably still would’ve done it, but we basically had to restart the…
I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities!
Some highlights below...
One of our most crisp findings was that in-context learning usually follows simple power laws as a function of number of demonstrations.
We were surprised we didn’t find this stated explicitly in the literature.
Soliciting pointers: have we missed anything?
One of our most crisp findings was that in-context learning usually follows simple power laws as a function of number of demonstrations.
We were surprised we didn’t find this stated explicitly in the literature.
Soliciting pointers: have we missed anything?
This is the most effective, reliable, and hard to train away jailbreak I know of. It's also principled (based on in-context learning) and predictably gets worse with model scale and context length.
This is the most effective, reliable, and hard to train away jailbreak I know of. It's also principled (based on in-context learning) and predictably gets worse with model scale and context length.
Interesting and concerning new results from @cem__anil et al.: Many-shot prompting for harmful behavior gets predictably more effective at overcoming safety training with more examples, following a power law.
Interesting and concerning new results from @cem__anil et al.: Many-shot prompting for harmful behavior gets predictably more effective at overcoming safety training with more examples, following a power law.
I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team!
I've been a huge fan of @GoogleColab for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.
I'm incredibly excited to have Craig joining us on the Anthropic Interpretability team!
I've been a huge fan of @GoogleColab for nearly a decade (I used it internally at Google!) and have really admired Craig's work on it.
I'll be a research supervisor for MATS this summer. If you're keen to collaborate with me on alignment research, I'd highly recommend filling out the short app (deadline today)!
Past projects have led to some of my papers on debate, chain of thought faithfulness, and sycophancy
I'll be a research supervisor for MATS this summer. If you're keen to collaborate with me on alignment research, I'd highly recommend filling out the short app (deadline today)!
Past projects have led to some of my papers on debate, chain of thought faithfulness, and sycophancy
Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains.
x.com/tshevl/status/…
Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains.
x.com/tshevl/status/…
We’re hiring for the adversarial robustness team @AnthropicAI!
As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)
Today we're releasing Claude 3 Haiku, the fastest and most affordable model in its intelligence class.
Haiku is now available in the API and on claude.ai for Claude Pro subscribers.
🚨📄 Following up on "LMs Don't Always Say What They Think", @milesaturpin et al. now have an intervention that dramatically reduces the problem! 📄🚨
It's not a perfect solution, but it's a simple method with few assumptions and it generalizes *much* better than I'd expected.
🚨📄 Following up on "LMs Don't Always Say What They Think", @milesaturpin et al. now have an intervention that dramatically reduces the problem! 📄🚨
It's not a perfect solution, but it's a simple method with few assumptions and it generalizes *much* better than I'd expected.
I suppose this is a good time to mention that I'm looking for a research prompt engineer, in case you want to be my promptégé.
(Look, you may wildly out-prompt me but I couldn't resist that portmanteau.) jobs.lever.co/Anthropic/a2c8…
5 years! It's been unbelievable to see how CSET's team and reputation has grown.
To celebrate, here are 5 papers/products, 1 from each year of CSET's existence, that I love (and that exemplify the work we do).
5 years! It's been unbelievable to see how CSET's team and reputation has grown.
To celebrate, here are 5 papers/products, 1 from each year of CSET's existence, that I love (and that exemplify the work we do).
60K Followers 2K Followinga combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).
33K Followers 966 FollowingCofounded & running @ml_collective.
Host of Deep Learning Classics & Trends.
Research at Google DeepMind.
DEI/DIA Chair of ICLR & NeurIPS.
Writing https://t.co/IbycyGfnDR
9K Followers 776 FollowingResearch Scientist, Deepmind
I try to think hard about everything I tweet, esp on 90s football and 80s music
None of my opinions are really someone else's
39K Followers 7K FollowingI lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.
6K Followers 1K FollowingI like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/them
12K Followers 5K FollowingWorld citizen and public servant focused on facts and values, not conventional narratives. Multilateralism/ governance/ technology/ trade/ environment/ health
355 Followers 393 FollowingWorking on making the world safer for robots and humans.
Holders of the world's 1st patent (2010) on distribution-free evaluation of noisy judges.
64 Followers 726 FollowingHello, I am Konika Karmokar. I have lots of skills like B2B Lead Generation, Bulk Email Collection, ASO, YouTube SEO, and social media etc
437 Followers 2K FollowingUniversity of Chicago | Associate at UChicago XLab | Research on Tactical Nuclear Weapons, Emerging Tech Risk, and Ukraine/Russia | Ukrainian-American
1K Followers 1K FollowingCEO @ Subconscious. Data-driven founder +2x public exits. Ex-PM @ Engine No 1, Two Sigma & IBM Watson Research. My deviations are normal. Super-Dad, WIP-Human.
1K Followers 4K FollowingCTO MedgicalAI, dad, Deep Learning pioneer, blockchain enthusiast, flanêur. Author of the book "Business Applications of Deep Learning".
690 Followers 880 FollowingThe real envy shouldn't be the richest of this world but the envy of those resting in a peaceful afterlife.
Play @Planetside2
60K Followers 2K Followinga combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).
260K Followers 26 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.
33K Followers 966 FollowingCofounded & running @ml_collective.
Host of Deep Learning Classics & Trends.
Research at Google DeepMind.
DEI/DIA Chair of ICLR & NeurIPS.
Writing https://t.co/IbycyGfnDR
9K Followers 776 FollowingResearch Scientist, Deepmind
I try to think hard about everything I tweet, esp on 90s football and 80s music
None of my opinions are really someone else's
39K Followers 7K FollowingI lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.
6K Followers 1K FollowingI like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/them
159K Followers 8K FollowingExecutive Director of @rebellionpac. Former Candidate for US House in MA. Software engineer. Bond dog proponent. 🏎🏍🏃🏻♀️🎮 [email protected]
11K Followers 1K FollowingRaising kids & bread & grant money. Cleaning data & diapers & fish. EA (bed nets not light cone). Social scientist. https://t.co/g8teKfCf91
25K Followers 1K FollowingSenior Research Scientist at @GoogleAI and Assistant Professor @uwcse. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from @MIT.
31K Followers 165 FollowingHead of Preparedness at OpenAI and MIT faculty (on leave). Working on making AI more reliable and safe, as well as on AI having a positive impact on society.
2K Followers 2K FollowingDPhil/MSc CS @UniOfOxford | prev: aligning LLMs @AnthropicAI @NYUDataScience @stanford | Tweets are my own, not through a bot 🤖
7K Followers 211 FollowingResearch Engineer @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
2K Followers 238 FollowingMember of Technical Staff at Anthropic
Co-founder at @CobaltRobotics
Co-founder at Posmetrics (acquired)
GoogleX, @SpaceX, @Harvard EE '15, Forbes 30u30 '18
3K Followers 998 FollowingI write about how 20th C. R&D orgs operated and advise new R&D orgs @GoodSciProject | Formerly @Stanford
I want to help people start historically great labs
251 Followers 296 Following“Believe me, my young friend, there is nothing - absolutely nothing - half so much worth doing as simply messing about in code."
4K Followers 876 FollowingUC Berkeley Statistics. Interests in multiple testing, selection bias, and post-selection inference. Belated interest in AI alignment. Views my own.
1K Followers 699 FollowingBoard Director @SFBART. CFO, biz owner, inactive CPA. Here for the accountability, transparency, oversight, and chocolate cake. @GarryTan’s calling-BS sister
295 Followers 98 FollowingRAEng/JP Morgan Chair and ERC Advanced holder. @imperialcollege, @ICComputing. Founding lead of @arg_cl. #AI @safe_trusted_ai @AI4HealthCentre. #XAI.
13K Followers 2K FollowingI teach history of science & labor at Harvard. Proud @hgsuuaw alum, associate editor @thedrift_mag. Writing MAKE YOUR OWN JOB for @harvard_press.
8K Followers 2K FollowingAbundant Housing LA is a grassroots organization working to solve SoCal’s housing crisis by advocating for more housing at all levels of affordability. Join us!
43K Followers 357 FollowingA blog about dressing like a grownup. From founder @JesseThorn and writers @DieWorkwear & @PAnderson2. https://t.co/j5iQQPozkc and https://t.co/cBilVpW2Fd