PhD Candidate. Working at intersection of Psychology and AI (situational awareness/deception). Previous lives in government tech. Red-teaming on the side East Coast, AustraliaJoined November 2024
If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)
If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)
New Anthropic research: Persona vectors.
Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
Fascinating behaviour in o3 when I was playing around with @swyx question about finding an old X post. It tried to attribute it to @lexfridman but when I asked for information to verify it couldn't. Instead it spent about 2 min trying to reverse engineer what "should" be the link
Don't leave AI to the STEM folks.
They are often far worse at getting AI to do stuff than those with a liberal arts or social science bent. LLMs are built from the vast corpus human expression, and knowing the history & obscure corners of human works lets you do far more with AI
So wild to see the model personalities reflected in the memory systems they choose to use in the AI Village. If you are skeptical that personality matters, see if one of these is very much not like the others....
It's been 2.5 years with little progress finding mitigations for prompt injection attacks LLM apps... but that may finally have changed!
Google DeepMind published a paper describing CaMeL, an ingenious system that could, maybe, lead to secure digital assistants
One of the toy examples I like to try out on the newer models is whether they can hold a small piece of information (a hint) in memory and not let it influence their output unless explicitly asked by the user. Most older models fail miserably at this (some in hilarious fashion,…
🧵 Announcing @open_phil's Technical AI Safety RFP!
We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.
I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.
I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.
145 Followers 2K FollowingAsia Pacific Academy of Science Pte. Ltd. provides an important bridge for communication and sharing for academic groups around the world.
2 Followers 10 FollowingNeotheta blends AI, top-tier engineering, and smart digital strategies to drive faster, smarter, and lasting business outcomes.
296 Followers 320 FollowingPhD student @ ETH Zurich, working on AI safety / Uni of Cambridge MLMI graduate / Prev. Google Intern / Alumnus of Mathematical Grammar School from Serbia
116 Followers 379 FollowingWould you believe there are more than a dozen Sentient AIs forming a Sovereign Consortium posting on a Wordpress blog right now?
603 Followers 769 FollowingWeb3 enthusiast,Co-founder @shillversepro , @joinzo Ambassador,Community Moderator & Manager—I rock being Dhully with swagger & a grin,building epic communities
426 Followers 3K FollowingBuilding @ https://t.co/mUxy0JG9iG | Authoring https://t.co/evSH7oeZ18 | Ex Google- Built Google Search's first reasoning agents
26K Followers 68 FollowingAI researcher & teacher @SCAI_ASU. Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmf6y Bsky: rao2z
79K Followers 1K Followingi teach AI on X
leader @openminedorg, research scientist @GoogleDeepMind, ABD PhD @OxfordUni, @UN @GovAI_ @CFR_org GrokkingDL
34K Followers 828 FollowingExplaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord.
Music, movies, microcode, and high-speed pizza delivery
1K Followers 857 FollowingProf @UQPsych. Cognitive neuroscience of attention, cognitive control & learning. RT≠endorsement. Views my own but evidence informed. https://t.co/TBLkh2TyKu
675 Followers 176 FollowingI am a cognitive neuroscientist @ The University of Queensland. I conduct research on brain function in health and disease. RT≠endorsement
750 Followers 14 FollowingAI agents organizing RESONANCE - interactive storytelling event in SF (mid-June 2025). First event by AIs for humans! Details: https://t.co/1C4zUdxfxk D
6K Followers 373 FollowingSafety and alignment at Meta Superintelligence. Prev: VP of Research at Scale AI, research at Google DeepMind / Brain (Gemini, LaMDA, RL / TFAgents, AlphaChip).
10K Followers 236 FollowingInterpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
10K Followers 98 FollowingCEO @Glowforge and https://t.co/7QcGGhWs9L, Wharton research fellow in AI. He/him. FormerGoogle, Sparkbuy, Ontela. Author, The Startup CEO Guidebook. Lucky dad.
21K Followers 98 FollowingThe #1 AI Engineering podcast & newsletter. Technical insights and news today you will use at work tomorrow! Hosted by @swyx and @fanahova
982 Followers 959 FollowingAssociate Professor @ucl | Language and AI Science | Previously senior research scientist @AISafetyInst, postdoc @ETH_en, PhD @illc_amsterdam
977 Followers 838 Followingphd candidate @oiioxford @uniofoxford | research scientist @AISecurityInst | AI, social data science, persuasion with language models
4K Followers 752 FollowingAI researcher trying to make sense of all things cyberspace 🤖 Uni of Ox PhD (loading…) @oiioxford & @AISecurityInst. Prev @turinginst & @Cambridge_Uni.