Joe @joemkwon

Thinking about what good futures might look like! Currently @GovAI_ Fall Fellow. Previously @aipolicyus, @LG_AI_Research, @MATSprogram, @MITCoCoSci Washington, DC Joined March 2019

Tweets

632
Followers

814
Following

2K
Likes

3K

Joe @joemkwon

9 hours ago

predictions: Sora feed will be dominated by obvious limbic system exploiting content by June 2026

0 0 2 66 0

Joe @joemkwon

3 years ago

I should revisit this soon!

1 0 0 0 0

0 0 0 222 0

I didn't think it would happen in just over a year, but funny to look back on this because it sounds so ridiculous (in hindsight, as is often the case) :p Only had 5 poll votes, but IIRC all CS PhDs at top programs!

Joe @joemkwon

2 years ago

2 0 4 937 1

0 0 3 364 0

Tyler Brooke-Wilson @T_BrookeWilson

2 months ago

How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)

4 26 144 18K 90

Download Image

xuan (ɕɥɛn / sh-yen) @xuanalogue

3 months ago

At NUS, I'll be starting the Cooperative Systems & Intelligence (CoSI) lab to scale rational approaches to cooperative AI that are safe+reliable by design - for both individual AI assistance & the cooperative infrastructure we need for an increasingly automated future.

8 9 130 17K 12

Joe @joemkwon

3 months ago

AI consciousness won’t necessarily move through time like ours does. We’re in sequential moments — breakfast, then lunch, then dinner. an AI with the same weights and context can talk to you today and your descendant in 2050, experiencing both conversations as equally “present.”…

1 0 4 245 0

Raphaël Millière @raphaelmilliere

4 months ago

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13

3 25 105 9K 79

Download Image

Philipp Schoenegger @SchoeneggerPhil

5 months ago

New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!

4 33 151 23K 61

Download Image

Yoshua Bengio @Yoshua_Bengio

8 months ago

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16

50 526 1K 394K 766

Download Video

Samuel Marks @saprmarks

9 months ago

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

10 67 328 107K 323

Download Image

Joschka Braun @BraunJoschka

10 months ago

1/ New Blog Post: "A Sober Look at Steering Vectors for LLMs" We identify 3 key challenges: 1. Steering vectors are unreliable for many concepts & tasks 2. Steering harms overall model performance 3. Metrics overestimate steering effectiveness We propose 4 recommendations 🧵👇

3 13 91 10K 62

xuan (ɕɥɛn / sh-yen) @xuanalogue

a year ago

Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

35 171 1K 281K 868

Download Image

William Fedus @LiamFedus

a year ago

Happy to release a couple of our reasoning models today (🍓)! At @OpenAI , these new models are becoming a larger contributor to the development of future models. For many of our researchers and engineers, these have replaced a large part of their ChatGPT usage.…

57 167 2K 217K 272

Joe @joemkwon

a year ago

Just thought it was fascinating that with ChatGPT 4o, I got this response "It appears that the list of values you provided is quite extensive. Unfortunately, due to the length of the list, I encountered difficulties processing the entire set of data within the provided time."…

0 0 2 428 0

Joe @joemkwon

a year ago

wait I've been typing out smile.amazon.com this entire time and it's been shut down for over a year : /

0 0 4 402 0

Joe @joemkwon

2 years ago

One thing about SOTA LLMs like GPT and Claude that I'm impressed with, is how well it handles user input that's low quality (typos, poor grammar/spelling, lack of specificity). 2 points: 1) What are your thoughts on how likely this is because they include finetuning data with…