Erik Jenner @jenner_erik

Research scientist @ Google DeepMind working on AGI safety & alignment ejenner.com Joined April 2016

Tweets

183
Followers

918
Following

152
Likes

552

David Lindner @davlindner

4 weeks ago

MATS is a great opportunity to start your career in AI safety! For MATS 9.0 I'll be running a research stream together with @emmons_scott @jenner_erik and @zimmerrol If you want to do research on AI oversight and control, apply now!

Ryan Kidd @ryan_kidd44

a month ago

13 57 277 1.0M 115

Download Image

0 2 21 982 2

🚀Henry is launching the Astra Research Program! @sleight_henry

a month ago

🚀 Applications now open: Constellation's Astra Fellowship 🚀 We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!

6 35 154 67K 122

Download Image

METR @METR_Evals

2 months ago

Prior work has found that Chain of Thought (CoT) can be unfaithful. Should we then ignore what it says? In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass.

6 36 302 52K 145

Download Image

Erik Jenner @jenner_erik

3 months ago

The fact that current LLMs have reasonably legible chain of thought is really useful for safety (as well as for other reasons)! It would be great to keep it this way

Mikita Balesni 🇺🇦 @balesni

3 months ago

The fact that current LLMs have reasonably legible chain of thought is really useful for safety (as well as for other reasons)! It would be great to keep it this way

38 113 447 222K 257

Download Image

1 0 8 478 1

Erik Jenner @jenner_erik

3 months ago

We stress tested Chain-of-Thought monitors to see how promising a defense they are against risks like scheming! I think the results are promising for CoT monitoring, and I'm very excited about this direction. But we should keep stress-testing defenses as models get more capable

Scott Emmons @emmons_scott

3 months ago

6 40 184 74K 88

Download Image

0 0 13 423 1

Victoria Krakovna @vkrakovna

3 months ago

As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420

17 45 217 86K 95

Download Image

David Lindner @davlindner

3 months ago

Can frontier models hide secret information and reasoning in their outputs? We find early signs of steganographic capabilities in current frontier models, including Claude, GPT, and Gemini. 🧵

11 18 103 17K 55

Download Image

Erik Jenner @jenner_erik

3 months ago

My @MATSprogram scholar Rohan just finished a cool paper on attacking latent-space probes with RL! Going in, I was unsure whether RL could explore into probe bypassing policies, or change the activations enough. Turns out it can, but not always. Go check out the thread & paper!

Rohan Gupta @RohDGupta

3 months ago

2 3 17 2K 10

Download Image

0 1 12 486 3

Sarah Cogan @sarah_cogan

3 months ago

Gemini 2⃣.5⃣ technical report is out!! 🙌🥂😇

1 1 21 987 3

Download Image

Daniel Filan @dfrsrchtwts

4 months ago

New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.

AXRP - the AI X-risk Research Podcast @AXRPodcast

4 months ago

New episode with @davlindner, covering his work on MONA! Check it out - video link in reply. https://t.co/CStTMbO83z

0 0 5 2K 3

1 4 30 3K 7

Download Image

Buck Shlegeris @bshlgrs

6 months ago

We’ve just released the biggest and most intricate study of AI control to date, in a command line agent setting. IMO the techniques studied are the best available option for preventing misaligned early AGIs from causing sudden disasters, e.g. hacking servers they’re working on.

8 23 247 24K 91

Ryan Greenblatt @RyanPGreenblatt

6 months ago

IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵

Anthropic @AnthropicAI

6 months ago

150 604 4K 1.1M 2K

Download Image

5 16 157 14K 40

Erik Jenner @jenner_erik

6 months ago

UK AISI is hiring, consider applying if you're interested in adversarial ML/red-teaming. Seems like a great team, and I think it's one of the best places in the world for doing adversarial ML work that's highly impactful

Xander Davies @alxndrdavies

7 months ago

4 37 172 47K 74

Download Image

1 4 48 3K 10

David Lindner @davlindner

6 months ago

Consider applying for MATS if you're interested to work on an AI alignment research project this summer! I'm a mentor as are many of my colleagues at DeepMind

Ryan Kidd @ryan_kidd44

6 months ago

Consider applying for MATS if you're interested to work on an AI alignment research project this summer! I'm a mentor as are many of my colleagues at DeepMind

3 25 140 57K 81

0 1 22 1K 2

Cas (Stephen Casper) @StephenLCasper

8 months ago

🚨 New @iclr_conf blog post: Pitfalls of Evidence-Based AI Policy Everyone agrees: evidence is key for policymaking. But that doesn't mean we should postpone AI regulation. Instead of "Evidence-Based AI Policy," we need "Evidence-Seeking AI Policy." arxiv.org/abs/2502.09618…

5 26 123 11K 43

Download Image

Max Nadeau @MaxNadeau_

8 months ago

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

4 84 248 80K 183

Download Image

Impact Academy @aisafetyfellows

9 months ago

🚀 Applications for the Global AI Safety Fellowship 2025 are closing on 31 December 2025! We're looking for exceptional STEM talent from around the world who can advance the safe and beneficial development of AI. Fellows will get to work full-time with leading organisations in…