New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here: anthropic.com/research/many-…
We’re sharing this to help fix the vulnerability as soon as possible. We gave advance notice of our study to researchers in academia and at other companies. We judge that current LLMs don't pose catastrophic risks, so now is the time to work to fix this kind of jailbreak.
Claude 3 models are great. Pls allow people using the API to roleplay any kind of stories without "fading to black" or "feeling uncomfortable to continue". it won't hurt anyone, really. Not talking about details on cooking meth or creating a bomb. But I shouldn't be reprimanded for having a relationship with my waifu, going to a brothel, or using a fireball to incinerate a village as a dark mage.
@AnthropicAI Your thread is creating a buzz! #TopUnroll threadreaderapp.com/thread/1775211… 🙏🏼@vankous for 🥇unroll
@AnthropicAI This is quite interesting and a hard problem to solve without models that can better reason.