Anthropic @AnthropicAI, Twitter Profile

Anthropic @AnthropicAI

4 weeks ago

New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here: anthropic.com/research/many-…

83 348 2K 500K 871

Download Image

Anthropic @AnthropicAI

4 weeks ago

We’re sharing this to help fix the vulnerability as soon as possible. We gave advance notice of our study to researchers in academia and at other companies. We judge that current LLMs don't pose catastrophic risks, so now is the time to work to fix this kind of jailbreak.

4 5 120 13K 6

Octavian Voss @octavianvoss

4 weeks ago

Claude 3 models are great. Pls allow people using the API to roleplay any kind of stories without "fading to black" or "feeling uncomfortable to continue". it won't hurt anyone, really. Not talking about details on cooking meth or creating a bomb. But I shouldn't be reprimanded for having a relationship with my waifu, going to a brothel, or using a fireball to incinerate a village as a dark mage.

6 1 31 3K 3

Thread Reader App @threadreaderapp

4 weeks ago

@AnthropicAI Your thread is creating a buzz! #TopUnroll threadreaderapp.com/thread/1775211… 🙏🏼@vankous for 🥇unroll

0 0 2 1K 0

Charles R. Dog @CharlesRDog

4 weeks ago

@AnthropicAI This is quite interesting and a hard problem to solve without models that can better reason.

0 0 0 186 0