Jacob Pfau @jacob_pfau

Alignment at UKAISI and PhD student at NYU jacobpfau.com London Joined June 2019

Tweets

765
Followers

2K
Following

1K
Likes

27K

Cas (Stephen Casper) @StephenLCasper

2 months ago

🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien: Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

7 39 199 26K 84

Download Image

Geoffrey Irving @geoffreyirving

2 months ago

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

AI Security Institute @AISecurityInst

2 months ago

7 64 190 112K 100

9 27 166 25K 37

Geoffrey Irving @geoffreyirving

3 months ago

Short background note about relativisation in debate protocols: if we want to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. 🧵

1 2 8 2K 6

Download Image

Geoffrey Irving @geoffreyirving

3 months ago

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

8 55 344 28K 240

Download Image

Benjamin Hilton @benjamin_hilton

4 months ago

Come work with me!! I'm hiring a research manager for @AISecurityInst's Alignment Team. You'll manage exceptional researchers tackling one of humanity’s biggest challenges. Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks. 1/4