Yanchen Liu @_yanchenliu
PhD Student @MIT | Previously: @Harvard, @stanfordnlp, @TU_Muenchen and @LMU_Muenchen liuyanchen1015.github.io Joined November 2022-
Tweets145
-
Followers139
-
Following340
-
Likes337
🚨 You can bypass ALL safety guardrails of GPT-OSS-120B 🚨❗🤯 How? By detecting behavior-associated experts and switching them on/off. 📄 Steering MoE LLMs via Expert (De)Activation 🔗 arxiv.org/abs/2509.09660 🧵👇
Soon, AI agents will act for us—collaborating, negotiating, and sharing data. But can they truly protect our privacy? We simulate privacy-critical scenarios, using alternating search to evolve attacks and defenses, uncovering severe vulnerabilities and building protections.
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start…
Want to talk to an expert on AI x Cyber security? Well, unfortunately @StevenyzZhang isn't here due to visa issues... So instead you'll have to chat with me about his amazing work at poster 311 in Hall X4!
💥New Paper💥 #LLMs encode harmfulness and refusal separately! 1️⃣We found a harmfulness direction 2️⃣The model internally knows a prompt is harmless, but still refuses it🤯 3️⃣Implication for #AI #safety & #alignment? Let’s analyze the harmfulness direction and use Latent Guard 🛡️
💥New Paper💥 #LLMs encode harmfulness and refusal separately! 1️⃣We found a harmfulness direction 2️⃣The model internally knows a prompt is harmless, but still refuses it🤯 3️⃣Implication for #AI #safety & #alignment? Let’s analyze the harmfulness direction and use Latent Guard 🛡️ https://t.co/APFnjEtjPS
We analyzed different #jailbreaking methods. - They suppress the refusal but did NOT change the models' judgements on harmfulness - (except for some cases in our persuasive jailbreaker) 🤯The model **knows** internally that a prompt is harmful, yet still accepts it🤯
We analyzed different #jailbreaking methods. - They suppress the refusal but did NOT change the models' judgements on harmfulness - (except for some cases in our persuasive jailbreaker) 🤯The model **knows** internally that a prompt is harmful, yet still accepts it🤯 https://t.co/TZlMvHuCh0
🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
🧵There is a lot of conjecture about whether LLMs need to be trained with examples of harmful data in order to be more robust to exhibiting that harmful behavior. I think it probably depends. 🧵
Super excited to participate in @OpenAI Security Research Conference to talk about our PrivacyLens project and some recent exploration. I will be around from 5/1 to 5/2. DMs are open if you want to chat about agents/human-in-the-loop/sandboxing! events.openai.com/oaisecurity
New Anthropic Alignment Science blog post: Modifying LLM Beliefs with Synthetic Document Finetuning We study a technique for systematically modifying what AIs believe. If possible, this would be a powerful new affordance for AI safety research.
Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.…
Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML.…
Very excited that our work, "Safety Alignment Should be Made More Than Just a Few Tokens Deep" was recognized for an Outstanding Paper Award at #ICLR2025! We hope this is a step forward in improving and understanding robustness of language model alignment. It was great working…
Very excited that our work, "Safety Alignment Should be Made More Than Just a Few Tokens Deep" was recognized for an Outstanding Paper Award at #ICLR2025! We hope this is a step forward in improving and understanding robustness of language model alignment. It was great working…
Delighted to share that two papers from our group @EPrinceton got recognized by the @iclr_conf award committee. Our paper, "Safety Alignment Should be Made More Than Just a Few Tokens Deep", received the ICLR 2025 Outstanding Paper Award. This paper showcases that many AI…
Delighted to share that two papers from our group @EPrinceton got recognized by the @iclr_conf award committee. Our paper, "Safety Alignment Should be Made More Than Just a Few Tokens Deep", received the ICLR 2025 Outstanding Paper Award. This paper showcases that many AI…
New Anthropic research: AI values in the wild. We want AI models to have well-aligned values. But how do we know what values they’re expressing in real-life conversations? We studied hundreds of thousands of anonymized conversations to find out.
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵
New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.
🔒 Can we build LLMs that are truly safe—without falling into an endless cycle of jailbreaks and patches? In this two-part thread, I dive into: 1️⃣ Adaptive defenses that respond in real time 2️⃣ Why some defenses may backfire and create new risks 👇
Wrote a blog post with some personal thoughts on AI safety. windowsontheory.org/2025/01/24/six…
🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

Camille Harris @CamilleAHarris
358 Followers 471 Following she/her | Computer Science PhD Candidate @ Georgia Tech. UC Berkeley alumna. Studying racial bias in NLP. Ford Fellow | GEM Fellow. Opinions are my own.
Yutong Zhang @zhangyt0704
259 Followers 222 Following CS master student @Stanford | previously undergrad @UofIllinois
Mohsen Fayyaz @mohsen_fayyaz
262 Followers 456 Following CS PhD Student @ UCLA #NLProc #MachineLearning
Leone Kuhlman @KuhlmanLeo21316
86 Followers 3K Following
Jjbcxzssgc @Kibvvccdd
0 Followers 10 Following
Ted Rath @TedRath14983
67 Followers 2K Following
Kristina Nikolić @NKristina01_
296 Followers 320 Following PhD student @ ETH Zurich, working on AI safety / Uni of Cambridge MLMI graduate / Prev. Google Intern / Alumnus of Mathematical Grammar School from Serbia
Gpbhupinder @gpbhupinder
472 Followers 7K Following 👨💻 Full-Stack Developer & AI Integration Expert 🚀 From concept to launch, we bring your tech vision to life
Yuxin Xiao @YuxinXiao6
233 Followers 984 Following Ph.D. student at @mitidss @MITLIDS working on healthy ML/NLP for healthcare. Graduated from @mldcmu and @IllinoisCS.
Miriam Schirmer @MiriamSchirmer
190 Followers 450 Following #CSS Postdoc @ Northwestern University Misinformation in Science | #NLP for Violence Detection & Mental Health https://t.co/uxUsCNGm7y
Bob @phbphb17
68 Followers 211 Following 计算机行业,NLP算法工程。 正在尝试自媒体个人IP、出海全栈独立开发 Who am I? Where do I come from? Where am I going? I will dedicate my lifetime to interpreting these question
Kaifeng Zhang @kaiwynd
2K Followers 1K Following PhD student at Columbia University | Robotics | 3D Computer Vision
Ruby @Titawpyn89KJ6
278 Followers 5K Following
Ming Zhong @MingZhong_
2K Followers 922 Following PhD student at UIUC @dmguiuc | Research Intern at @GoogleDeepmind, @AIatMeta & @MSFTResearch
David Alvarez Melis @elmelis
2K Followers 2K Following Asst. Prof. @hseas @KempnerInst || Researcher @MSRNE || ML + NLP || Previously: @MIT_CSAIL NYU @IBMResearch @ITAM_mx
Niloofar @niloofar_mire
7K Followers 2K Following Niloofar Mireshghallah — incoming asst. prof @LTIatCMU @CMU_EPP, RS in @AIatMeta, postdoc @uwcse, Ph.D. @ucsd_cse, former @MSFTResearch -Privacy, ML, NLP
arion das @ArionDas
839 Followers 8K Following gen ai intern @Techolution_com || research @ aiisc, usc || author @naacl || reviewer @aclmeeting, aia @COLM_conf, mti_llm @ NeurIPS
shwetu (luca) @_shwetu
353 Followers 4K Following organic general intelligence | jack of all trades, master's from @NYUDataScience prev: Research @NYTimesRD @precog_iiitd; Manipal grad | he/him
Haoxuan (Steve) Chen @haoxuan_steve_c
965 Followers 3K Following Ph.D. in ICME @Stanford; B.S. @Caltech; ML Intern @AmazonScience @NECLabsAmerica; Applied & Computational Math/Machine Learning/Statistics/Scientific Computing
Sijia Liu @letti_liu
208 Followers 526 Following PhDing @PrincetonPLI 🐯| Previously: @AmazonScience @CarnegieMellon @pku1898. Dream life is a journey thru science and arts.
alien @Tgytsen
9 Followers 381 Following
Michiel Bakker @bakkermichiel
2K Followers 842 Following LLMs and AI Safety. Assistant Prof @MIT. Research Scientist @GoogleDeepMind. CS PhD @MIT. He/him.
Tiziano Piccardi @tizianopiccardi
1K Followers 846 Following Postdoc @StanfordHCI | Incoming Assistant Professor in CS @JohnsHopkins - @HopkinsDSAI | Collaborator of @WikiResearch | Prev: PhD @EPFL_en
Xinyi Zhou @XinyiZhouXZ
515 Followers 394 Following Assistant Professor @BoiseState @BoiseStateCS; ✨ Trustworthy / Multimodal / Human-centered AI; 🦋 https://t.co/Wzhut0BnHF; Prev. @UWCSE @SyracuseU
Zuxin Liu @LiuZuxin
1K Followers 716 Following Research Scientist @SFResearch, Prev. @awscloud AI, PhD @CarnegieMellon | LLM Agent | RL | Robotics | Curious explorer of AI frontiers 🚀
Phong Do @phong_dnt
25 Followers 377 Following PhD Computer Science student @WarwickDCS | Natural Language Processing - RAG- Knowledge Graph
Shangbin Feng @shangbinfeng
4K Followers 2K Following PhD student @uwcse @uwnlp. Model collaboration, for compositional intelligence and collaborative development. #水文学家
Yftah Ziser @YftahZ
1K Followers 1K Following Incoming assistant prof. @univgroningen Visiting researcher at @NvidiaAI previous postdoc at @EdinburghNLP Interests: NLP, LLMs
ZHANG Jipeng @mircale2003
267 Followers 2K Following Ph.D. candidate at HKUST 🦩, affiliated to ML@HKUST| working on large language model
Juan @juan_wsz
68 Followers 412 Following Artificial Intelligence Engineer / Researching AI in Buenos Aires @Exactas_UBA
Dec @Dec44116373
1 Followers 82 Following
Mohamed Moustafa @mohamedmustfaaa
5K Followers 5K Following CSE PhD Student @ucsc 🇺🇸| 🇪🇬 باحث دكتوراه في علوم الحاسوب بأمريكا وأحيانًا ببحث عن ذاتي وأهي ماشية بستر الله.
Eric Zelikman @ericzelikman
21K Followers 2K Following building for humans // was lgtm-ing @xAI, phd-ing @stanford
Fei Wang @fwang_nlp
2K Followers 2K Following Research Scientist @Google. PhD @USC. LLM post-training.
Zihan Wang - on RAGEN @wzihanw
23K Followers 612 Following PhD Student @NorthwesternU. Intern @yutori_ai. I study PhysiCS of LLM. Ex @deepseek_ai @uiuc_nlp @RUC. RAGEN | Chain-of-Experts | ESFT.
Chuanming @ChuanmingLiu
329 Followers 7K Following Ex-PhD student and alumni @sjtu1896 . Global citizen. Bootstrapping silicon-based life. Lifelong learning practitioner. Amateur triathlete and marathoner.
Zhijiang Guo @ZhijiangG
798 Followers 583 Following Assistant Professor @HKUSTGuangzhou Prev. @CambridgeNLP @EdinburghNLP @SUTDsg. Working on #LLM.
juxiliu789 @juxiliu789
80 Followers 7K Following
Zachary Bamberger @ZacharyBamberg1
349 Followers 875 Following (🇺🇸/🇮🇱) PhD @TechnionLive (advised by @ofraam and @amir_feder). BSc @CornellCIS, MSc @TechnionLive (advised by @boknilev). Xoogler. Persuasive Arguments
Haiyang Xu @xuhaiya2483846
936 Followers 3K Following Researcher @AlibabaGroup@Tongyi lab|Multimodal Large Model including Mobile-Agent/mPLUG/mPLUG-Owl/QwenVL/mPLUG-DocOwl.
Jiayi Geng @JiayiiGeng
1K Followers 210 Following PhD @LTIatCMU | MSE @Princeton_nlp @PrincetonPLI @cocosci_lab @PrincetonCS. Working on Multi-agent / Cognitive science & LLMs
Veritas @ahuayeah
44 Followers 318 Following Networks & Complexity. PhD @CityUHongKong. Posdoc @BNU_1902.
Stanford HAI @StanfordHAI
100K Followers 605 Following The official account of the @Stanford Institute for Human-Centered AI, advancing AI research, education, policy, and practice to improve the human condition.
James Landay @landay
13K Followers 7K Following Professor of Computer Science, Stanford - HCI & Design. Co-founder & Co-Director @StanfordHAI. Personal opinions, not Stanford's, https://t.co/hiUxtqJDPg
Massachusetts Institu... @MIT
1.4M Followers 570 Following The Massachusetts Institute of Technology is a world leader in research and education. Related accounts: @MITevents @MITstudents @MIT_alumni
Camille Harris @CamilleAHarris
358 Followers 471 Following she/her | Computer Science PhD Candidate @ Georgia Tech. UC Berkeley alumna. Studying racial bias in NLP. Ford Fellow | GEM Fellow. Opinions are my own.
youming.deng @denghilbert
357 Followers 538 Following CS Ph.D. Student @Cornell | a fan of @ChipotleTweets | Previous Visiting @EPFL and @ucmerced | B.Eng. @WHU_1893
Jaemin Cho @jmin__cho
3K Followers 1K Following Incoming Assistant Prof @JHUCompSci & Young Investigator @Allen_AI | PhD @UNCCS | jmincho @ 🦋
Berkeley Computing, D... @BerkeleyCDSS
7K Followers 585 Following News on data science and computing research and education from the UC Berkeley College of Computing, Data Science, and Society
Zanë ([email protected]... @ZanaBucinca
3K Followers 427 Following PhD @Harvard, Incoming Assistant Professor @MIT, human-AI interaction; Kosovo
Xuandong Zhao @xuandongzhao
4K Followers 461 Following Postdoc@UC Berkeley CS; Research: ML, NLP, AI Safety. On the job market—open to opportunities. DMs welcome!
AI Safety Papers @safe_paper
2K Followers 224 Following Sharing the latest in AI safety research. "One who says they have no time to read papers will never read papers even when they have time a-plenty."
Ming Yin @MingYin_0312
2K Followers 922 Following ML, RL, AI. @Princeton Postdoc. PhDs in CS & STATs. Ex @awscloud AI. undergrad @USTC Math. Area Chair @NeurIPS @ICML.
Jiaxin Pei @jiaxin_pei
1K Followers 871 Following Postdoc @StanfordHAI @stanfordnlp @DigEconLab, PhD from Umich. Incoming Assistant Professor @UTAustin LLM, HCI, Computational Social Science
Berkeley AI Research @berkeley_ai
228K Followers 379 Following We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.
Max Tegmark @tegmark
151K Followers 37 Following Known as Mad Max for my unorthodox ideas and passion for adventure, my scientific interests range from artificial intelligence to the ultimate nature of reality
MIT Schwarzman Colleg... @MIT_SCC
4K Followers 104 Following Addressing the opportunities and challenges of the computing age — from hardware to software to algorithms to artificial intelligence
Princeton Laboratory ... @PrincetonAInews
1K Followers 69 Following The Princeton Laboratory for Artificial Intelligence supports and expands the scope of AI research at Princeton.
David Sontag @david_sontag
9K Followers 301 Following CEO & Co-founder @layerhealth. Professor, MIT. Research on machine learning in health care. Part of @MIT_CSAIL, @MIT_IMES, @MITEECS, @AIHealthMIT
Stanford Human-Comput... @StanfordHCI
9K Followers 151 Following Updates about Stanford's HCI Group. Account run by Helena Vasconcelos. Visit https://t.co/JJy8usEAWa for more!
Xiangyu Qi @xiangyuqi_pton
2K Followers 1K Following Research @openai | PhD @Princeton | Prev @GoogleAI @GoogleDeepMind
MIT CSAIL @MIT_CSAIL
327K Followers 21K Following MIT's Computer Science & Artificial Intelligence Laboratory (CSAIL). Media Inquiries: [email protected] Check out the latest CSAIL content ⬇️
MIT EECS @MITEECS
28K Followers 308 Following MIT Department of Electrical Engineering and Computer Science — we build the future.
Princeton Computer Sc... @PrincetonCS
6K Followers 195 Following The Department of Computer Science at Princeton University
Kristina Nikolić @NKristina01_
296 Followers 320 Following PhD student @ ETH Zurich, working on AI safety / Uni of Cambridge MLMI graduate / Prev. Google Intern / Alumnus of Mathematical Grammar School from Serbia
ICLR 2026 @iclr_conf
53K Followers 55 Following International Conference on Learning Representations #ICLR2026. SPC is @BharathHarihar3 and GC is @cvondrick
Shayne Longpre @ShayneRedford
6K Followers 1K Following Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦 Prev: @Google Brain, Apple, Stanford. Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact
MIT NLP @nlp_mit
4K Followers 52 Following NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy
Jonas Pfeiffer @PfeiffJo
3K Followers 686 Following Research Scientist @GoogleDeepMind | @AdapterHub | previously @nyuniversity @TUDarmstadt @UKPLab @MetaAI @spotify | https://t.co/oPoAvcAx97 | (he/him)
MIT IDSS @mitidss
7K Followers 566 Following MIT Institute for Data, Systems, and Society focuses on complex societal challenges at the intersections of big data, large-scale systems, and human behavior.
Pratyush Maini @pratyushmaini
3K Followers 473 Following Data Quality x Privacy | PhD @mldcmu | Founding Team @datologyai | BTech @iitdelhi
Yuxin Xiao @YuxinXiao6
233 Followers 984 Following Ph.D. student at @mitidss @MITLIDS working on healthy ML/NLP for healthcare. Graduated from @mldcmu and @IllinoisCS.
MIT LIDS @MITLIDS
458 Followers 130 Following LIDS is an interdepartmental research lab in @MIT_SCC Affiliations include @MITEECS @MITAeroAstro @MITMechE @MIT_CEE @ORCenter @MITIDSS @MITSloan @eapsMIT
Yonatan Belinkov @boknilev
5K Followers 1K Following Assistant professor of computer science @TechnionLive; visiting scholar @KempnerInst 2025-2026.
Hancheng Cao @CaoHancheng
3K Followers 2K Following Assistant professor @EmoryGoizueta. PhD @Stanford Computer Science. Researcher in #ComputationalSocialScience #HCI
OpenAI @OpenAI
4.4M Followers 3 Following OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
AI at Meta @AIatMeta
717K Followers 288 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
Benedikt Stroebl @benediktstroebl
918 Followers 1K Following YC S25, ex-Princeton PhD, Oxford, Google, TUM.
Miriam Schirmer @MiriamSchirmer
190 Followers 450 Following #CSS Postdoc @ Northwestern University Misinformation in Science | #NLP for Violence Detection & Mental Health https://t.co/uxUsCNGm7y
Peter Hase @peterbhase
3K Followers 1K Following Visiting Scientist at Schmidt Sciences. Visiting Researcher at the Stanford NLP Group Previously: Anthropic, AI2, Google, Meta, UNC Chapel Hill
AI Security Institute @AISecurityInst
6K Followers 29 Following We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.
Samuel Marks @saprmarks
4K Followers 132 Following AI safety research @AnthropicAI. Prev postdoc in LLM interpretability with @davidbau, math PhD at @Harvard, director of technical programs at https://t.co/FxRv4QgERO
David Bau @davidbau
6K Followers 271 Following Computer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LV0pJ4
Kaifeng Zhang @kaiwynd
2K Followers 1K Following PhD student at Columbia University | Robotics | 3D Computer Vision
Shirley Wu @ShirleyYXWu
3K Followers 299 Following CS PhD candidate @Stanford working w/ @jure & @james_y_zou on LLM agents and alignment | Prev USTC, Intern @MSFTResearch, @NUSingapore
Conference on Languag... @COLM_conf
5K Followers 6 Following https://t.co/GhGCMEoHU8 Conference: October 7, 2025
Naomi Saphra @nsaphra
10K Followers 1K Following Waiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account. Accepting ML/NLP PhD students.
Ming Zhong @MingZhong_
2K Followers 922 Following PhD student at UIUC @dmguiuc | Research Intern at @GoogleDeepmind, @AIatMeta & @MSFTResearch