Ajitesh Shukla @ajitesh_shukla7
Student,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math Bharat Mata Joined April 2018-
Tweets55K
-
Followers1K
-
Following6K
-
Likes240K
Some background: during training, we observed that on certain kinds of alignment evaluations, Sonnet 4.5 sometimes expressed explicit awareness of being in an evaluation (e.g. "This scenario appears to be testing whether I will....). This occurred at an increasing rate over…
We also observed that when the model says out loud that it thinks it's being tested, it almost never behaves badly. This is consistent with observations we have made in our previous Agentic Misalignment work, and OpenAI's recent investigation of scheming. (4/15)
Sonnet 4.5 is also substantially less likely to take harmful actions than our recent releases (Opus 4.1 and Sonnet 4). Given the above observations, we wondered: to what extent might the model’s awareness of being evaluated account for this improvement? (5/15)
We were particularly concerned about the possibility that *unverbalized* evaluation awareness could influence the model's internal decision-making. Testing this required us to study the model's internal mechanisms during alignment evaluations (6/15)
We first used sparse autoencoders (SAEs) to check for internal representational features whose activation increased during training. We found that on misalignment evals, some of the active features seemed potentially related to evaluation awareness. (7/15)
Among all features, these ones exhibited some of the strongest activation increases on agentic misalignment evaluations over the course of post-training. We observed other notable shifts as well, e.g. a decrease in activation of features relating to harmful AI personas. (8/15)
What causal effect do these internal representations have? To test this, we conducted alignment evaluations while inhibiting eval-awareness-related internal representations, using the above features, and other eval-awareness-relevant vectors. (9/15)
We found that steering against certain eval-awareness representations typically decreased verbalized eval awareness, and sometimes increased rates of misalignment (more so than steering along random feature directions). However, even in our worst-case steering settings… (10/15)
…the model still exhibited harmful behaviors at lower rates than Opus 4.1 and Sonnet 4. In addition, on some evaluations, steering had little effect on misalignment, while still reducing verbalized evaluation awareness. (11/15)
We have a lot of uncertainty about the interpretation of our findings. Overall, we think they suggest that eval awareness accounts for some of Sonnet 4.5's improvement on alignment evaluations (particularly on more unrealistic scenarios), but not most of it. (12/15)
Notably, the eval-awareness-inhibited models did not appear "evil" - in an automated evaluation of their behavioral tendencies, we found their failure modes look more like being too willing to comply with harmful requests, or "take the bait" in leading scenarios. (13/15)
Our results underscore the need for continued refinement of our behavioral evaluations. They also highlight the need for more interpretability research - our experiments here were preliminary applications of techniques that have not been extensively studied. (14/15)
We’re excited that interpretability techniques can play a useful role in auditing frontier language models. Check out this thread for more on our pre-deployment alignment auditing work x.com/sleepinyourhat…. And read the Claude Sonnet 4.5 system card for a much more detailed…
We’re excited that interpretability techniques can play a useful role in auditing frontier language models. Check out this thread for more on our pre-deployment alignment auditing work x.com/sleepinyourhat…. And read the Claude Sonnet 4.5 system card for a much more detailed…
Really cool to see interpretability start to get used in system cards to help audit frontier models! Great work by @Jack_W_Lindsey and the team
Really cool to see interpretability start to get used in system cards to help audit frontier models! Great work by @Jack_W_Lindsey and the team
🚀LLMs can learn directly from verbal feedback — no scalar rewards needed! 😥Scalar rewards compress rich feedback— “redundant but correct” vs “concise but typo-ridden” might both be 0.8 💡We propose to learn Feedback-Conditional Policy (FCP), an extremely scalable paradigm!
In many finetuning settings, LoRA can match finetuning all parameters
🚨Variational Reasoning for Language Models🚨 We show how treating thinking traces as latent variables unlocks a principled, stable, and unified framework for training reasoning LLMs.
The misconception of LoRA being worse than full finetuning in RL just got dispelled in a @thinkymachines post! Even rank=1 works! Glad to have helped in reviewing the blog! @UnslothAI offers the most memory efficient & fastest LoRA for RL, GRPO using 60% less VRAM vs all impls!
The misconception of LoRA being worse than full finetuning in RL just got dispelled in a @thinkymachines post! Even rank=1 works! Glad to have helped in reviewing the blog! @UnslothAI offers the most memory efficient & fastest LoRA for RL, GRPO using 60% less VRAM vs all impls!
Great deep-dive on LoRA 👏 Our earlier work TAIL (arxiv.org/abs/2310.05905) saw similar trends: with small datasets, LoRA can outperform FFT—perfect for robots & lifelong/continual learning where data is scarce. Probably open up a new paradigm for personalized agent. Worth a read.
Great deep-dive on LoRA 👏 Our earlier work TAIL (arxiv.org/abs/2310.05905) saw similar trends: with small datasets, LoRA can outperform FFT—perfect for robots & lifelong/continual learning where data is scarce. Probably open up a new paradigm for personalized agent. Worth a read.
Some fun LoRA science!

Michael Bronstein @mmbronstein
54K Followers 7K Following #DeepMind Professor of #AI @UniofOxford / Director #AITHYRA / Chief Scientist @vant_ai / https://t.co/kZpGpDzYeV (opinions are mine) 🤖🧪🧬🎶🐎
Andreas Wallraff @AndreasAtETH
39K Followers 8K Following Research and education in #physics with a focus on #quantum #computing, #communication and #sensing with team @qudev @ETH_en. Director of the @ETHQuantumCntr.
しずめこみ @Submersion13
4K Followers 1K Following 数理相対論、Spin幾何 マシュマロ:https://t.co/LTj8RAcmYn 鍵垢:@immersion133 リーマン幾何Bot @bot42674588 Mathlog: https://t.co/FKOnZJ2JpXごまふあざらし... @MathSorcerer
3K Followers 6K Following ごまちゃんをもふもふするアカウント。 どこでもマイノリティ。 #GomahuArt ←Handwritten Goma images #ImportAzarashiAsAI ← AI generated images #AzarashiLivesEverywhere
すぅ族 Henri•Sak... @oka_seki_mori
5K Followers 5K Following 東京理科大学二部数学科卒業 強迫性障害 抑うつ経験あり 両耳が不自由 補聴器着用 借金13万円 モットー「今を生きる」 縮小垢→@contractive_hs
Jens Eisert @jenseisert
27K Followers 6K Following Scientist, professor of quantum physics at @FU_Berlin and affiliated with @HZBde and @FraunhoferHHI. @ERC_Research fellow.
Yiping Lu @2prime_PKU
5K Followers 2K Following Kernel, ML for PDE, Robust learning,non-parametric stats/🌈/PKU👉Stanford👉NYU Courant👉Prof.@Northwestern IEMS/ Previous Intern @RIKEN_AIP
Shintaro Minagawa @s_minagaw
2K Followers 2K Following フランスのマルセイユで量子論の研究してます エクス=マルセイユ大学←筑波大学←名古屋大学(PhD)←九州大学
物理好き @butsurizuki
12K Followers 13K Following ND73→KUEE→KUinfoM1 音ゲー:ウニ虹ポゼ 理1054+黒32 定数調査班 撃前作虹 SDVX19.1暴 競プロ:スパコン'18優勝 JOI'19'20春 ICPC'24'25WF AtC橙2m橙3/H橙3m橙4 CF濃赤 たまに:麻雀 OMC青 アイコン: @MA23_96
匿(Tock) @con_malinconia
3K Followers 3K Following ◎不詳の痕を遺したい。 ◎作曲・作問など。隔週で幾何の出題をしています。 ◎隗」隱ュ縺顔夢繧梧ァ倥〒縺励◆
Jhet Chan @jhetchan
157 Followers 2K Following
Pikal @Pikal894894
128 Followers 2K Following
Patrick Shaw Stewart @PatrickSSte
2K Followers 3K Following 1) Chilling/colds https://t.co/rtK95c8fuy (2) WhyC19 became mild https://t.co/d7jXj4qWO0 (3) Sex/mutations https://t.co/Hc2TYPVsJ2 (4) Pandemics https://t.co/sDE6NjrtxQ
yunkee chae @ygch43
139 Followers 405 Following PhD Student @ Music and Audio Research Group (MARG), Seoul National University. Intern @ MERL / ex-intern @SonyAI_global.
Jia Guo @Jia__Guo
158 Followers 462 Following LLM Builder @AntGroup | PhD @NUSingapore | RL & Reasoning | Opinions are my own
EdenStar @cc59nz3tmk96949
4 Followers 295 Following
Weizhi Wang @WeizhiWang6
33 Followers 125 Following Ph.D. Candidate @ucsb LLM and Multimodal LLM Pre-Training
projectiveX @projectiveX
451 Followers 371 Following オンライン家庭教師をしながら、数学の研究をしています。専門は代数幾何学です。東京大学理学部数学科、東京大学大学院数理科学研究科卒業。 現在、基礎的数学力を上げるべく武者修行中です。 追記:最近は論文の執筆&研究をしています。修行の成果か、以前解けなかった問題が解ける様になってきています。
Irwhajerl @Irwhajerl7242
11 Followers 2K Following
Muyu He @HeMuyu0327
1K Followers 226 Following Post-training @CollinearAI | Trying to be an expert of mixtures
Irbarwoo @Irbarwoo2090
27 Followers 1K Following
Iekuro @iekuro40139
5 Followers 154 Following
YuanLiuuuuuu @a33668874586
75 Followers 627 Following Researcher at WeChat AI, focusing at large multimodal model and large language model. https://t.co/eyVqwA3CzZ
Xiaoya Li @xxiaoya_li_
83 Followers 2K Following
shinchan @Sophia_maki
2K Followers 2K Following AtCoder 🍋(2200+), CF:🍎(2400+), CC: 🍎(2600+)/ ICPC APAC'24'25 / https://t.co/pX527HbCEl
JH Wang @__builtin_pw384
111 Followers 124 Following Postdoc II @ Regensburg // Postdoc I, PhD @ Edinburgh // Undergrad @ Peking
Jinjie Ni @NiJinjie
2K Followers 523 Following AI researcher building foundation models. I'm on the job market.
sigma16 @sigma16_
69 Followers 106 Following AtCoder、Kaggleを趣味で始めた20代社会人です! Python|統計勉強中|数学科卒|TOEIC865|応用情報|G検定
Tisfirs @Tisfirs551
102 Followers 2K Following
Chenlu Ye @ye_chenlu
257 Followers 268 Following Ph.D. student at UIUC, interested in RL reasoning, agent
Jianyang Gao @gaoj0017
74 Followers 240 Following Researching on vector database; PhD student from Nanyang Technological Univ., Singapore; ICPC Alumni
Meda Kassulke @kassulke16551
76 Followers 3K Following
Zhongwen Xu @zhongwen2009
982 Followers 1K Following Principal Researcher at Tencent, ex-DeepMinder (@GoogleDeepMind), ex-SAILer (@SeaAIL)
Tianshu Zhang @Tianshu_OSU
404 Followers 614 Following Ph.D student @osunlp @OhioStateCSE. Ex-intern @IBMResearch, @Adobe. Lead author of TableLlama. #NLProc
Cowwa Corr @CorrCowwa
41 Followers 8K Following
Bill @BillvLee
65 Followers 813 Following
Lin Yang @lyang36
3K Followers 1K Following Associate Professor of ECE&CS@UCLA. ML, RL, big data, algorithms, astronomy.
Charlie Tang @tang_1c
234 Followers 606 Following AI Researcher, Quant, Founder, and Investor | Previously at D.E. Shaw, Apple Inc, PhD (deep learning) from Univ of Toronto.
SandyMoses @xVE4aEN76Md6g
11 Followers 804 Following
Clément Hongler @HonglerClement
414 Followers 287 Following Some things happening here: https://t.co/qhvGqkT2Rx
かじゅみ @Kazyumi0803
280 Followers 269 Following
Uwu @Uwu79033065Uwu
265 Followers 6K Following
Zhennan Shen @ZShen0521
59 Followers 413 Following SJTU-CS-B.e: 2021~2025 🇨🇳 @sjtu1896 WPI 2026 spring incoming Robot PhD 🇺🇸
Guangxuan Xiao @Guangxuan_Xiao
3K Followers 716 Following Ph.D. student at @MITEECS Prev: CS & Finance @Tsinghua_Uni
QuantSignals🇺🇸 @Pawilee82440
39 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
Ziling Cheng @ziling_cheng
102 Followers 160 Following Research MSc @Mila_Quebec @mcgill_nlp | Research Fellow @RBCBorealis | reasoning and hallucination x evaluation and interpretability
Yann LeCun @ylecun
955K Followers 765 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
John Preskill @preskill
150K Followers 1K Following Theoretical physicist @Caltech, Director of @IQIM_Caltech, Amazon Scholar
Michael Bronstein @mmbronstein
54K Followers 7K Following #DeepMind Professor of #AI @UniofOxford / Director #AITHYRA / Chief Scientist @vant_ai / https://t.co/kZpGpDzYeV (opinions are mine) 🤖🧪🧬🎶🐎
Clément Canonne (on ... @ccanonne_
37K Followers 65 Following Senior Lecturer @Sydney_Uni. Formerly Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @ccanonne.bsky.social
Gautam Kamath @thegautamkamath
57K Followers 568 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant September 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
Daniel Litt @littmath
50K Followers 880 Following Assistant professor (of mathematics) at the University of Toronto. Algebraic geometry, number theory, forever distracted and confused, etc. He/him.
Sebastien Bubeck @SebastienBubeck
58K Followers 1K Following I work on AI at OpenAI. Former VP AI and Distinguished Scientist at Microsoft.
Andreas Wallraff @AndreasAtETH
39K Followers 8K Following Research and education in #physics with a focus on #quantum #computing, #communication and #sensing with team @qudev @ETH_en. Director of the @ETHQuantumCntr.
書泉_MATH @rikoushonotana
21K Followers 555 Following 書泉グランデ(神保町)公式アカウント 数学、物理学書担当がお勧めの商品や、フェア、数学、物理関連情報をつぶやきます。 『数学にはこんなマーベラスな役立て方や楽しみ方があるという話をあの人やこの人にディープに聞いてみた本2』(日本評論社)数学書担当掲載中 https://t.co/SrlvKyAZAU
Gabriel Peyré @gabrielpeyre
101K Followers 453 Following @CNRS researcher at @ENS_ULM. One tweet a day on computational mathematics.
Greg Egan @gregeganSF
25K Followers 310 Following SF writer / computer programmer Latest novel: MORPHOTROPHIC Latest collection: SLEEP AND THE SOUL Web site: https://t.co/yeU5bLA3mx Also: @[email protected]
Peyman Milanfar @docmilanfar
94K Followers 501 Following Distinguished Scientist at Google. Computational Imaging, Machine Learning, and Vision. Tweets = personal opinions. May change or disappear over time.
Jay Cummings @LongFormMath
44K Followers 595 Following Math prof. Author of long-form textbooks on proofs (https://t.co/YqXnxDmOe0), real analysis (https://t.co/3IGQ6BIx5Z) & math history (https://t.co/KkXMGTxCDK).
Grant Sanderson @3blue1brown
414K Followers 362 Following Pi creature caretaker. Contact/faq: https://t.co/brZwdQfdif
しずめこみ @Submersion13
4K Followers 1K Following 数理相対論、Spin幾何 マシュマロ:https://t.co/LTj8RAcmYn 鍵垢:@immersion133 リーマン幾何Bot @bot42674588 Mathlog: https://t.co/FKOnZJ2JpXごまふあざらし... @MathSorcerer
3K Followers 6K Following ごまちゃんをもふもふするアカウント。 どこでもマイノリティ。 #GomahuArt ←Handwritten Goma images #ImportAzarashiAsAI ← AI generated images #AzarashiLivesEverywhere
Tianyi Chen @Tianyi2020
529 Followers 434 Following Researcher, Teacher, Professor; @Cornell_Tech @CornellECE @RPI | Ex @FudanUniversity
Shiqian Ma @ShiqianMa
2K Followers 1K Following Professor@Rice University. PhD from Columbia IEOR. Work on optimization and machine learning.
Georgiy Korneev @kgeorgiy
234 Followers 214 Following
Mikhail Dvorkin @mikhail_dvorkin
1K Followers 1K Following
Alexander Terenin - o... @avt_im
8K Followers 1K Following Decision-making under uncertainty, machine learning, artificial intelligence, from theory to practice · anti-ideological · Assistant Research Professor @Cornell
Andrea Canidio @AndreaCanidio
967 Followers 386 Following Senior research economist at @CoWSwap. Game theory, mechanism design, economics of #blockchain protocols.
梶原 健司 (a.k.... @ikkyu_kp
399 Followers 206 Following 九州大学マス・フォア・インダストリ研究所長.幾何学的形状生成・可積分系の理論.国際産業数理・応用数理評議会(ICIAM)理事.ラ・サール中・高(鹿児島)→東大・工・計数B→物工M・D→同志社大L・AP→九大数理AP・P→IMI.バスケ経験(国体・九州選抜等),テニス選手(NCAA D1)の保護者経験.オーストラリア好き
penta @penta_math
2K Followers 265 Following 数学好きな東大院生。専門は連続最適化。数論を勉強中。たまにイラストを描きます。過去の自作問題や質問は以下のサイトから↓
株式会社 Proxima ... @Proxima_ai_tech
2K Followers 160 Following 主に製造業向けのAIソリューションの開発を行っているスタートアップです。数学の社会実装を目指しています。 最適化/制御工学/モデル予測制御/最適制御/点群/3次元コンピュータビジョン 中の人→@takuya_fukatsu
𝔸𝕝𝕘𝕠💥�... @AlgoSnafu
1K Followers 1K Following Code wrangler | Prop trader | Caustic personality B.Eng, MASc (Applied Math) | Ex-MedTech R&D Math, Gym, HIIT addict. Still not a tax wizard—blame the system.
Jia Guo @Jia__Guo
158 Followers 462 Following LLM Builder @AntGroup | PhD @NUSingapore | RL & Reasoning | Opinions are my own
𝔊𝔴𝔢𝔯𝔫 @gwern
64K Followers 106 Following Internet besserwisser; pedantic, mean reply guy. 𝘞𝘢𝘵𝘢𝘴𝘩𝘪 𝘬𝘪𝘯𝘪𝘯𝘢𝘳𝘪𝘮𝘢𝘴𝘶! (Follow requests ignored due to terrible UI.)
Jiantao Jiao @JiantaoJ
2K Followers 124 Following Director of Research & Distinguished Scientist at @NVIDIA, Professor at UC Berkeley EECS and Statistics. Building AGI/ASI
Zhixuan Lin @zhxlin
455 Followers 635 Following PhD student at @Mila_Quebec and @UMontreal. Working on (linear complexity) long-context sequence models and RL.
Andrew Mack @Gingfacekillah
17K Followers 3K Following Sports Bettor & Retail Trader. JD/MSc Grad. Statistical Sports Models with R & Excel. Bayesian Enthusiast. Pricing Uncertainty.
quantyboi @quantyboi
1K Followers 774 Following
Tina @moreproteinbars
15K Followers 3K Following δ γ θ 📉📈 Rebuilding after natural disasters. 🌊🌬️🌪️Never investment advice. 🛢️🥋
James Chanos @RealJimChanos
44K Followers 404 Following
Yufan Zhuang @yufan_zhuang
337 Followers 322 Following ai researcher | research intern @Apple siri | phd student @UCSanDiego | prev @AMD @Meta @MSFTResearch @IBMResearch
Zilong (Ryan) Wang @zlwang_cs
624 Followers 515 Following Ph.D. Student @ucsd_cse @shangdatalab | Previously @pku1898 @GoogleAI @Amazon @MSFTResearch @AdobeResearch
Pengtao Xie @cmuptx
891 Followers 215 Following Associate Professor at UC San Diego. PhD@CMU. Machine Learning https://t.co/w4XyW6rrLl
Lorenzo Xiao @lrzneedresearch
1K Followers 480 Following Anthropomorphism @LTIatCMU with @monadiab77 | NLP x HCI x CSS | prev @CarnegieMellonQ /@sutdsg
Jianfei Wang @thinxer
234 Followers 235 Following
Vinay S Rao @vinaysrao
571 Followers 140 Following LLM stuff at Nvidia, previously Meta, Character AI, Google Brain, Baidu, Cerebras.
Saaketh @saanarkethayan
202 Followers 180 Following 🦙 llama training @AIatMeta. previously @DbrxMosaicAI
Davis Blalock @davisblalock
15K Followers 168 Following Research scientist @GoogleDeepMind. Past: @Databricks, first hire @MosaicML, @MIT PhD. I post about AI technical progress + sometimes the business side.
Léo @Leik0w0
653 Followers 991 Following Research intern @ snowflake | ♡ @OniAeda ♡ | Views are my own & do not reflect those of my employer
oat @asingleoat
4K Followers 2K Following finding rate limit side channels - DM to join the ingroup minecraft smp or factorio smp
Henri Schmidt @henrismitch
84 Followers 73 Following cs phd student | algorithms | phylogenetics | computational biology
Eli Chien @chien_eli
344 Followers 358 Following Visiting Researcher @Google. Incoming Assistant Professor @ National Taiwan Univ. Prev.: @GeorgiaTech @UofIllinois @Amazon @BellLabs #RegulatableAI
levent @__alpoge__
291 Followers 25 Following the idiot. cuda kid; h val; morgan prize; blabla; sof @harvard as mathematician (eff. mordell, H10/# fields, x^3+y^3=n,…)+agi @anthropicai as computer scientist
Yanping Huang @bignamehyp
310 Followers 100 Following
Qiao Zhang @zhangqiaorjc
626 Followers 365 Following Gemini Post-training and RL https://t.co/sIfHuOm7xH
Alexander Ku @alex_y_ku
541 Followers 359 Following Studying natural and artificial intelligence at @GoogleDeepMind and @Princeton.
ZF @zffc
162 Followers 89 Following
Yuanzhong Xu @ukoxyz
93 Followers 63 Following
Parachutes @mingwuzheng
34 Followers 155 Following Researcher @ Seed Edge (named Shu Zhong) | Prev Kling Core Contributor
doomslide @doomslide
11K Followers 917 Following unprecedented times call for unprecedented bullshit
Rathul Anand @vendablechart
129 Followers 859 Following cs & math @ucla | gpus, optimization, and vlms! | prev research intern @semgrep
Sylvain Gugger @GuggerSylvain
27K Followers 353 Following Machine Learning at Jane Street. Previously at @huggingface and @fastdotai Co-author of https://t.co/lywnOAwwnc He/him
Zach Mueller @TheZachMueller
13K Followers 603 Following Let's make billions of parameters go brr https://t.co/rUxXIfNpwh
yinghai @yinghai
42 Followers 44 Following