Reposting because I find this super interesting! (Not really because this is meant to be a harbinger of doom or anything like that, but man what is going on inside o3's weights for it to want to put things like this together?)
Reposting because I find this super interesting! (Not really because this is meant to be a harbinger of doom or anything like that, but man what is going on inside o3's weights for it to want to put things like this together?)
@ilex_ulmus On content: I'm not convinced that disinfo / mass unemployment / etc. merit international governance rather than regional governance experimenting differently. (Tho I agree that manufactured pandemics are a global issue.) So it's a mixed bag.
Seeing the CoT of o3 for the first time definitely convinced me that future mitigations should not rely on CoT interpretability.
I think more RL will make it harder to interpret, even if we put no other pressure on the CoT.
Seeing the CoT of o3 for the first time definitely convinced me that future mitigations should not rely on CoT interpretability.
I think more RL will make it harder to interpret, even if we put no other pressure on the CoT.
@AndrewCritchPhD@jamespayor Dario said "I'm familiar with doomer arguments; they're gobbledegook". The obvious interpretation of this is not "some people have made dodgy claims, notwithstanding that big names in the field have made sound ones". It's "doomers are talking shit, don't listen to them".
18 Followers 59 FollowingNothing here is intended to reflect the views of any organizations larger than myself. This is not a place of honor. The danger is still present in your time.
13K Followers 14K Followingfine art landscape photographer + developer in colorado
building better tools so more people can get outside & enjoy earth
Trína chéile le chéile claochlaithe
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
776 Followers 1K FollowingЗаранее извиняюсь перед всеми кого чем нибудь оскорбил.
Узнаю интересное, безумствую, пишу программы, уже почти что не ненавижу себя.
Имею ADHD/GAD.
5K Followers 7K FollowingFounder & AI wrangler at https://t.co/b4R1fyiCVP & https://t.co/7Vt8cKVayt. Ex data lead @HelpScout, engineer @Automattic, captain @USAirForce, cadet @AF_Academy.
3K Followers 17 FollowingHigh-volume account of @ESYudkowsky, the original AI alignment guy. If it's missing punctuation, it's humor. If you can't tell, it's probably also humor.
18 Followers 59 FollowingNothing here is intended to reflect the views of any organizations larger than myself. This is not a place of honor. The danger is still present in your time.
1K Followers 6 FollowingA farmer’s market every day. Butcher and grocery sourcing local Texas meat, produce, dairy, eggs, bread, & seafood plus curated global dry goods. 1912 E 7th St.
212K Followers 33 FollowingBreak the cycle. Lead the Expedition.
Reactive turn-based RPG by @SandfallGames.
Out Now on Xbox Series X|S, PlayStation 5, and PC via Steam and Epic Games.
39K Followers 999 FollowingFormer journalist, now financial analyst. Based in Taipei. Tweet mainly about semiconductors and Taiwan. Not investment advice. Views are my own.
4K Followers 1K FollowingAccelerating aligned AI & a flourishing future with neglected approaches & AI R&D. CEO at @aestudiola (AI consulting co puts profits into AI frontier)
18K Followers 4K FollowingAI professor.
Deep Learning, AI alignment, ethics, policy, & safety.
Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI.
AI is a really big deal.
966 Followers 863 FollowingTechnical Specialist - Safeguarded AI - ARIA https://t.co/aIwOFs2jv7
Co-founder, ex-Director & Board at https://t.co/GphUSABGT9
My views are my own ✨🤖🧠
132 Followers 125 FollowingBridging the cultures between formal verification and AI.
My p(doom) is 50% cuz it either happens or it doesn't.
https://t.co/NlknpJL8D6
4K Followers 2K FollowingDirector of https://t.co/gCEDoKdKBT at Uni of Cambridge | Researching Big Risks, and impacts of AI & emerging tech. Opinions own
7K Followers 20 Followingtired of AI apps agreeing with everything you say?
Try chatting with Auren and Seren: https://t.co/R9pVitTnRI
soon: @its_auren and @its_seren!
9K Followers 20 FollowingAdvancing humanity's understanding of AI through interpretability research. Building the future of safe and powerful AI systems.
127 Followers 318 Followingmonodirectional overdetermining hypothetical pseudocrux. views are those of counterlogical employers, but merely because they're dense in the space of views
67K Followers 290 FollowingChief US Economist, Bloomberg LP @economics. Former Fed/CEA/US Treasury, @uchi_economics @UCberkeley. All opinions are my own.