Jordan Taylor @JordanTensor

Working on new methods for understanding machine learning systems and entangled quantum systems. sites.google.com/view/jordanten… Brisbane Joined December 2009

Tweets

438
Followers

368
Following

1K
Likes

29K

Xander Davies @alxndrdavies

3 weeks ago

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

8 63 291 52K 121

Download Image

Amanda Askell @AmandaAskell

a month ago

I'm learning the true Hanlon's razor is: never attribute to malice or incompetence that which is best explained by someone being a bit overstretched but intending to get around to it as soon as they possibly can.

14 60 965 46K 163

Neel Nanda @NeelNanda5

3 months ago

It was great to be part of this statement. I wholeheartedly agree. It is a wild lucky coincidence that models often express dangerous intentions aloud, and it would be foolish to waste this opportunity. It is crucial to keep chain of thought monitorable as long as possible

Mikita Balesni 🇺🇦 @balesni

3 months ago

38 113 447 222K 257

Download Image

4 14 147 12K 38

Mikita Balesni 🇺🇦 @balesni

3 months ago

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

38 113 447 222K 257

Download Image

AI Security Institute @AISecurityInst

3 months ago

Can we leverage an understanding of what’s happening inside AI models to stop them from causing harm? At AISI, our dedicated White Box Control Team has been working on just this🧵

2 9 34 3K 14

Joseph Bloom @JBloomAus

3 months ago

🧵 1/13 My new team at UK AISI - the White Box Control Team - has released progress updates! We've been investigating whether AI systems could deliberately underperform on evaluations without us noticing. Key findings below 👇

AI Security Institute @AISecurityInst

3 months ago

1 0 20 6K 9

2 4 60 10K 24

Owain Evans @OwainEvans_UK

8 months ago

New paper: We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions. They can *describe* their new behavior, despite no explicit mentions in the training data. So LLMs have a form of intuitive self-awareness 🧵

45 151 950 151K 573

Download Image

Stephen McAleer @McaleerStephen

9 months ago

Ok sounds like nobody knows. Blocked off some time on my calendar Monday.

Stephen McAleer @McaleerStephen

9 months ago

Ok sounds like nobody knows. Blocked off some time on my calendar Monday.

492 56 1K 655K 179

69 11 354 53K 61

Yo Shavit @yonashav

9 months ago

Observation 5: Technical alignment of AGI is the ballgame. With it, AI agents will pursue our goals and look out for our interests even as more and more of the economy begins to operate outside direct human oversight. Without it, it is plausible that we fail to notice as the…