Robert Kirk @_robertkirk

Research Scientist at @AISecurityInst; PhD Student @ucl_dark. LLMs, AI Safety, Generalisation; @Effect_altruism robertkirk.github.io Joined January 2020

Tweets

384
Followers

1K
Following

285
Likes

618

Xander Davies @alxndrdavies

2 weeks ago

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

8 63 292 52K 121

Download Image

Robert Kirk @_robertkirk

4 weeks ago

New blog! We @AISecurityInst partnered with @NCSC to write about an emerging practice I'm really excited about: Safeguard Bypass Bounty Programmes (SBBPs). Summary of what these are, why they are useful, & how to do them well 🧵

2 11 50 8K 6

Robert Kirk @_robertkirk

a month ago

Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵

1 7 37 2K 16

Download Image

AI Security Institute @AISecurityInst

a month ago

🚨Open-weight AI models are becoming more powerful, now knocking on the door of today’s closed-weight frontier. This poses critical safety challenges – how can we prevent the misuse of models whose parameters are free to download online? 🧵