Shuvendu Roy @ShuvenduBikash
Generalizability in AI l Machine Learning Research Intern @RBCBorealis | Ph.D Candidate (AI) @queensu | Former: Student Researcher @google;@VectorInst shuvenduroy.github.io Toronto, Canada Joined December 2014-
Tweets2K
-
Followers94
-
Following854
-
Likes1K
by the way. recently wrote a paper on this! for transformers, the number is about 3.6 bits-per-parameter so you would need 25GB ÷ 3.6 bits ≈ 56.9B parameters to exactly memorize Wikipedia that’s a pretty big model actually
by the way. recently wrote a paper on this! for transformers, the number is about 3.6 bits-per-parameter so you would need 25GB ÷ 3.6 bits ≈ 56.9B parameters to exactly memorize Wikipedia that’s a pretty big model actually https://t.co/CJXFMAOieC
RLPT: Reinforcement Learning on Pre-Training Data • RL directly on pre-train data (no human labels) • Next-segment reasoning objective (ASR + MSR tasks) → self-supervised rewards • Gains on Qwen3-4B: +3.0 MMLU, +8.1 GPQA-Diamond, +6.6 AIME24, +5.3 AIME25
Thinking Augmented Pre-training "we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens…
How do language models actually develop their capabilities during pre-training? We need mechanistic insights into what's happening inside! We used crosscoders to track linearly interpretable features across 32 training snapshots, revealing a surprising two-phase learning process.
This MIT paper just broke my brain. Everyone keeps saying LLMs can't do real logical reasoning. Turns out we've just been teaching them wrong this whole time. These researchers built something called PDDL-INSTRUCT that actually teaches models to think through planning problems…
Meta Superintelligence Labs presents MetaEmbed: Scalable multimodal retrieval • Flexible late interaction via Meta Tokens • Test-time scaling: trade off retrieval accuracy vs efficiency • SOTA on MMEB + ViDoRe, robust up to 32B models • Matryoshka training → coarse-to-fine…
Ever wondered how Energy-Based Models (EBMs) work and how they differ from normal neural networks? ☕️We go over EBMs and then dive into the Energy-Based Transformers paper to make LLMs that refine guesses, self-verify, and could adapt compute to problem difficulty. (link👇)
Top AI Papers of The Week (September 15-21): - K2-Think - DeepDive - AgentScaler - Shutdown Resistance in LLMs - Is In-Context Learning Learning? - Towards a Physics Foundation Model - Retrieval and Structuring Augmented Generation with LLMs Read on for more:
🚨Brilliant New @AIatMeta Superintelligence Labs Paper. It asks a simple question: "Can inference compute substitute for missing supervision?" And the big deal is that this paper shows you don’t need humans to provide labels or feedback in reinforcement learning anymore.…
🇨🇳China unveils world's first brain-like AI Model SpikingBrain1.0 Upto 100X faster while being trained on less than 2% of the data typically required. Designed to mimic human brain functionality, uses much less energy. A new paradigm in efficiency and hardware independence.…
🇨🇳China unveils world's first brain-like AI Model SpikingBrain1.0 Upto 100X faster while being trained on less than 2% of the data typically required. Designed to mimic human brain functionality, uses much less energy. A new paradigm in efficiency and hardware independence.… https://t.co/8NA1PE4fkU
HUGE AI breakthrough from META. This can change everything (in AI industry) 30x Faster LLMs, 16x Bigger Contexts, Zero Accuracy Loss 👀 Meta Superintelligence Labs is clearly already cooking. "The core problem with long context is simple: making a document 2x longer can make…
HUGE AI breakthrough from META. This can change everything (in AI industry) 30x Faster LLMs, 16x Bigger Contexts, Zero Accuracy Loss 👀 Meta Superintelligence Labs is clearly already cooking. "The core problem with long context is simple: making a document 2x longer can make… https://t.co/fLO7gWQGOE
10 latest Preference Optimization techniques ▪️ Pref-GRPO ▪️ PVPO (Policy with Value PO) ▪️ DCPO (Dynamic Clipping PO) ▪️ ARPO (Agentic Reinforced PO) ▪️ GRPO-RoC (Resampling-on-Correct) ▪️ TreePO ▪️ DuPO ▪️ TempFlow-GRPO ▪️ MixGRPO ▪️ MaPPO (Maximum a Posteriori PO) Save the…
This is probably one of THE most important paper of the last few months. Small language models are sufficiently powerful, operationally suitable, and economical Agentic tasks. - Phi-2 matches 30 billion models running 15x faster. - Serving a 7 billion parameter small language…
Universal Deep Research NVIDIA recently published another banger tech report! The idea is simple: allow users to build their own custom, model-agnostic deep research agents with little effort. Here is what you need to know:
Fine-tuning LLM Agents without Fine-tuning LLMs Catchy title and very cool memory technique to improve deep research agents. Great for continuous, real-time learning without gradient updates. Here are my notes:
NVIDIA research just made LLMs 53x faster. 🤯 Imagine slashing your AI inference budget by 98%. This breakthrough doesn't require training a new model from scratch; it upgrades your existing ones for hyper-speed while matching or beating SOTA accuracy. Here's how it works:…
A Deep Dive into RL for LLM Reasoning Provides a roadmap for practitioners applying RL for LLM reasoning. Nice to have some of the latest techniques in one place.
Efficient Agents This is a great study full of insights on how to build efficient agents. If you are looking to reduce costs with AI agents, don't miss it. Pay attention to this one, devs! Here are my notes:
🚨This week's top AI/ML research papers: - Mixture-of-Recursions - Scaling Laws for Optimal Data Mixtures - Training Transformers with Enforced Lipschitz Constants - Reasoning or Memorization? - How Many Instructions Can LLMs Follow at Once? - Chain of Thought Monitorability -…

Brittany @BoldAsBrittany
197 Followers 3K Following Sushi or pizza? Or me? 😈 Scorpio seductress ♏️ Alberta’s hottest craving. Slide into my DMs… if you dare. 🔥
Lush Ⓥoid @LushVoid
343 Followers 2K Following Product designer, biotech, full stack dev, technical artist 🌳🌳
Rami Awar @iamramiawar
5K Followers 3K Following 🇳🇱 Simplifying data analysis and visualization 🚀 Try it out: https://t.co/nIuHLC3Wkg 🚀 @iamrami.bsky.social
Kory Mathewson @korymath
11K Followers 4K Following @GoogleDeepMind working on Veo + Flow -- getting great generative AI into the hands of great creative people
Teatouqueyd @TeatouqueydNDc
48 Followers 973 Following
Tirtharsh @Tirtharsh1qSLR
55 Followers 4K Following
Seasmason @SeasmasonlpIaQ
60 Followers 3K Following
TrudaClemens @UhoQo1OV95nEp
59 Followers 6K Following
Amelia @N0p68Ds1UH0ROl
96 Followers 7K Following Building on the XRP ledger. ♥️ Wife, kids, 🦜 & programming (TS, nodejs, Linux ..)
betty @betty2228592707
24 Followers 518 Following
Sorgleel @Sorgleelcypb
133 Followers 2K Following
MaxineDan @6a8tizG7UN2ebz
71 Followers 7K Following
VeromcaBecher @syNjo1oLKY6AP
72 Followers 7K Following
Hriday039 @FHriday039
18 Followers 509 Following Multimodality, Continual Learning | AI Consultant | Anime Freak
GillBrook @97YEFpqc7A7Ih
68 Followers 7K Following
MarinaPater @QVhjHOjzo8r518
18 Followers 1K Following
Trending News by Pock... @triptotomorrow
1K Followers 3K Following
Alberto Hojel @AlbyHojel
6K Followers 4K Following dreamer of dreams @LucidSimCorp (YC w25) // uc berkeley ‘24
Ahmad Beirami @abeirami
10K Followers 4K Following sth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
Danial Namazifard @IamDanialNamazi
675 Followers 6K Following Graduate Researcher at University of Tehran #NLProc #MachineLearning
Ahnaf Mozib Samin @im_samin
142 Followers 844 Following PhD student @ Queen's University | MSc in @em_lct @ummalta & @univgroningen | Learning Speech Processing | Brain-Computer Interface | NLP
Mojtaba @m_kolahdouzi
70 Followers 508 Following Ph.D. in AI | Postdoctoral Fellow | Interested in fairness in both real world and deep neural networks
Lisa Alazraki @LisaAlazraki
1K Followers 838 Following PhD student @ImperialCollege. Research Scientist Intern @AIatMeta prev. @Cohere, @GoogleAI. Interested in generalisable learning and reasoning. She/her
Ali Etemad @Ali_Etemad1
108 Followers 263 Following Researcher, Professor in AI, Deep Learning, Wearables, Human-Centered AI @ Queen's University | زن زندگی آزادی | Opinions my own
Ferdous Bin Ali @Hriday014
5 Followers 270 Following
Robert Scoble @Scobleizer
543K Followers 24K Following The best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
USMAN SHAH @usmanshah6677
4 Followers 34 Following
Pritam @pritam94
164 Followers 787 Following PhD candidate at @queensu and affiliate at @VectorInst. Ex: @Google, @BorealisAI, @Infosys, @tech_mahindra. opinions are my own.
Farhan Hasin @0pangktey0
30 Followers 374 Following
AAAC @officialAAAC
644 Followers 496 Following Association for the Advancement of #AffectiveComputing
Jonathan Lorraine @jonLorraine9
7K Followers 6K Following Research scientist @NVIDIA | PhD in machine learning @UofT. Opinions are my own. 🤖 💻 ☕️
Divij Gupta @divijgupta21
4 Followers 64 Following
Rahul Ku @RahulKu60647565
6 Followers 367 Following
Yaron @yaronsinger
493 Followers 76 Following CEO at Robust Intelligence and tenured machine learning professor at Harvard on leave to secure machine learning with a team of superhumans.
David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Fahima Akter @fahimaakter1207
29 Followers 211 Following
Mahim Haque @mahimpantho
145 Followers 1K Following AI Enthusiast Ms in CS@Vtech, Research Scientist @Luminary-Ai
Vitalii Bilokon @NewgroundAI
1K Followers 3K Following We believe that the next big thing in AI will be related to the bio-inspired evolutionary algorithms. #Neuroevolution Algorithms.
bamboolib - a GUI for... @bamboolib_
133 Followers 71 Following Stop googling pandas commands. More infos at https://t.co/YyjCjs9HFe
API World @APIWorld
4K Followers 5K Following The Center of the API Economy – World's Largest API & Microservices Conference & Expo | #APIWorld | Sept 3-12, 2025 | LIVE & Virtual for 2025
Raúl Gombru @gombru
458 Followers 1K Following Telecos UPC. PhD in Computer Vision UAB. Data Scientist @ Shutterstock. Generative. Barcelona - Dublin Raul Gomez Bruballa
Jiaxuan You @youjiaxuan
3K Followers 211 Following Assist. Prof. @ UIUC (@siebelschool), direct ULab on Graph & LLM Agent @Stanford CS PhD Sr. Scientist @NVIDIA
Kimi.ai @Kimi_Moonshot
53K Followers 100 Following Built by Moonshot AI to empower everyone to be superhuman.
Transactions on Machi... @TmlrOrg
6K Followers 5 Following Transactions on Machine Learning Research (TMLR) is a new venue for dissemination of machine learning research
Alexandr Wang @alexandr_wang
333K Followers 838 Following chief ai officer @meta, founder @scale_ai. rational in the fullness of time
Mark Carney @MarkJCarney
546K Followers 776 Following Prime Minister of Canada and Leader of the Liberal Party | Premier ministre du Canada et chef du Parti libéral
Yong Zheng-Xin (Yong) @yong_zhengxin
2K Followers 2K Following safety and reasoning @BrownCSDept || ex-intern/collab @AIatMeta @Cohere_Labs || sometimes write on https://t.co/cXhbz6Fx3t
Xuandong Zhao @xuandongzhao
4K Followers 461 Following Postdoc@UC Berkeley CS; Research: ML, NLP, AI Safety. On the job market—open to opportunities. DMs welcome!
Jon Richens @jonathanrichens
1K Followers 320 Following Research scientist in AI safety @GoogleDeepMind
Rami Awar @iamramiawar
5K Followers 3K Following 🇳🇱 Simplifying data analysis and visualization 🚀 Try it out: https://t.co/nIuHLC3Wkg 🚀 @iamrami.bsky.social
Kory Mathewson @korymath
11K Followers 4K Following @GoogleDeepMind working on Veo + Flow -- getting great generative AI into the hands of great creative people
Unsloth AI @UnslothAI
32K Followers 460 Following Open source LLM fine-tuning & RL! 🦥 https://t.co/2kXqhhvLsb
TTC Service Alerts @TTCnotices
480K Followers 41 Following Service alerts from TTC Transit Control. Follow @TTChelps for customer service.
Daya Guo @Guodaya
6K Followers 15 Following AI researcher @deepseek_ai. Interested in reasoning ability of LLMs. The long-term research goal is to develop artificial general intelligence.
Koch Institute at MIT @kochinstitute
46K Followers 1K Following The Koch Institute brings interdisciplinary approaches together to advance the fight against cancer. More places to follow us: https://t.co/fPPRkN0PSj
DeepSeek @deepseek_ai
972K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Arnaud Doucet @ArnaudDoucet1
6K Followers 811 Following Senior Staff Research Scientist @GoogleDeepMind, previously Oxford Stats Prof - interested in Comp Stats, Generative Modeling, Monte Carlo, Optimal Transport
PapersAnon @papers_anon
2K Followers 50 Following Just a fan of acceleration. I read and post interesting papers. Let's all make it through.
The AI Timeline @TheAITimeline
24K Followers 1 Following covering the latest AI & LLM research /// see "highlights" for all previous weekly threads /// building the best AI paper search engine @findmypapersai
Tim Dettmers @Tim_Dettmers
39K Followers 994 Following Creator of bitsandbytes.Research Scientist @allen_ai and incoming professor @CarnegieMellon. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.
Yujia Qin @TsingYoga
5K Followers 335 Following ByteDancer, Agent, THU (16-20 BS in EE, 20-24 PhD in CS)
Rohan Paul @rohanpaul_ai
97K Followers 8K Following Compiling in real-time, the race towards AGI. The Largest Show on X for AI. 🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
bioRxiv @biorxivpreprint
137K Followers 23 Following Daily summaries of preprints posted in each bioRxiv Subject Area. Find ways to get social media alerts to individual papers at https://t.co/wrlEW1phDV
Neuralink @neuralink
1.7M Followers 1 Following Creating a general-purpose, high-bandwidth interface to the brain
Tri Dao @tri_dao
33K Followers 632 Following Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.
Cohere Labs @Cohere_Labs
24K Followers 251 Following @Cohere's research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together.
CuspAI @cusp_ai
5K Followers 47 Following Pioneering breakthrough materials to accelerate human progress.
Trending News by Pock... @triptotomorrow
1K Followers 3K Following
Alberto Hojel @AlbyHojel
6K Followers 4K Following dreamer of dreams @LucidSimCorp (YC w25) // uc berkeley ‘24
Pranav Shyam @recurseparadox
2K Followers 665 Following research scientist at google deepmind. ಕನ್ನಡಿಗ. past: generative models @openai
Danial Namazifard @IamDanialNamazi
675 Followers 6K Following Graduate Researcher at University of Tehran #NLProc #MachineLearning
Salman Khan @KhanSalmanH
1K Followers 372 Following Faculty at MBZUAI. Past: Inception, ANU, Data61, NICTA, UWA.
Stephanie Zhan @stephzhan
25K Followers 2K Following GP @Sequoia, early stage AI. On superintelligence quest. 1st partner & board of @linear @middeskhq @reflection_ai @skildai. seeded: @heytavus @mach_industries
Ego4D @ego4_d
1K Followers 462 Following Massive-scale (but accessible) datasets and benchmark suites for human activity understanding https://t.co/swnxuaCth1 & https://t.co/ajYTwb7yPb
Barsee 🐶 @heyBarsee
274K Followers 862 Following Daily tweets on the latest AI and Tech developments to stay ahead of the curve | Founder of https://t.co/bpf7Dytcqj
Reiner Pope @reinerpope
3K Followers 448 Following CEO and founder, @MatXComputing, developing high throughput chips tailored for LLMs
Petar Veličković @PetarV_93
41K Followers 555 Following Senior Staff Research Scientist @GoogleDeepMind | Affiliated Lecturer @Cambridge_Uni | Assoc @clarehall_cam | GDL Scholar @ELLISforEurope. Monoids. 🇷🇸🇲🇪🇧🇦
Daniel Han @danielhanchen
28K Followers 2K Following Building @UnslothAI. Faster RL / training. LLMs bug hunter. OSS package https://t.co/aRyAAgKOR7. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.
fly51fly @fly51fly
8K Followers 2K Following BUPT prof | Sharing latest AI papers & insights | Join me in embracing the AI revolution! #MachineLearning #AI #Innovation
Nataniel Ruiz @natanielruizg
10K Followers 2K Following research @GoogleDeepMind | Veo team | author of DreamBooth | personalization of generative models
Matei Zaharia @matei_zaharia
45K Followers 1K Following CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, https://t.co/94gROE5Xa0. https://t.co/nmRYAKG0LZ
Marques Brownlee @MKBHD
6.3M Followers 481 Following Web Video Producer | ⋈ | Pro Ultimate Frisbee Player | Host of @WVFRM @TheStudio
Karsten Roth @confusezius
1K Followers 473 Following RS @GoogleDeepMind | Prev. PhD @ELLISforEurope 🇪🇺 w/ @zeynepakata & @OriolVinyalsML; Large Models × {Lifelong, Data, Multimodal}
Joelle Pineau @jpineau1
15K Followers 447 Following Chief AI Officer, @cohere Professor of Computer Science, @mcgillu Core academic member, @Mila_Quebec Ex-Meta (FAIR team)
Accepted papers at TM... @TmlrPub
4K Followers 5 Following