Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…
Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro… https://t.co/9vF9U29pWh
what are large language models actually doing?
i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work.
here’s everything you need to know about llms in 3 minutes↓
As promised, my SOP draft is here:
algoroxyolo.github.io/assets/pdf/lrz…
Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research.
As always RT appreciated!!
#PhDApplication#NLP#HCI
New lecture recordings on RL+LLM! 📺
This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)
Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition!
We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors.
🧵1/N
Excited to share a new project! 🎉🎉
doi.org/10.1101/2024.0…
How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task?
To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀
⏬
Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference.
Train a single elastic model, get 100s of nested submodels for free!
Paper: sca.fo/mmpaper
Code: sca.fo/mmcode
🧵(1/10)
Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!!
Check out the links below...
[VAE] by Hand ✍️
A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure.
In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…
EVLM
An Efficient Vision-Language Model for Visual Understanding
In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the
Just got around to trying ColPali arxiv.org/abs/2407.01449 but for more general extraction tasks than poorly formatted/scanned documents with complicated SEC tables*. Pretty impressive! VLMs for efficient indexing and late interaction matching gives a sizeable boost.
18K Followers 366 FollowingThe top education and research institution in the 🌎 for #AI and #machinelearning | Research
→ https://t.co/jUD0hZ8SFx | Learn more ↓
20K Followers 1K FollowingResearcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.
2K Followers 722 FollowingA diverse and collaborative community on the cutting edge of computing and technology within @HopkinsEngineer at @JohnsHopkins. https://t.co/3hwXFTdGyw
358K Followers 1K FollowingML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
15K Followers 51 FollowingEMNLP 2025 - The 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Hashtag: #EMNLP2025
Dates: November 5-9
Submission Deadline: May 19th
1K Followers 2K FollowingResearch Scientist at @SalesforceAI | Ph.D. from @UCLA | B.S. from @Tsinghua_Uni | Foundation Model, Theory, Reinforcement Learning | Opinions are my own
199 Followers 252 FollowingAssistant Professor @HDSIUCSD. Previously Research Assistant Professor @TTIC_Connect and PhD in Statistics & Data Science @Yale.
2K Followers 155 FollowingDeveloping efficient algorithms for real-time reinforcement learning. Research Scientist at Keen, a startup led by John Carmack.
Prev ~ PhD with Richard Sutton
8K Followers 1K FollowingDecision-making under uncertainty, machine learning, artificial intelligence, from theory to practice · anti-ideological · Assistant Research Professor @Cornell
97K Followers 8K FollowingCompiling in real-time, the race towards AGI.
The Largest Show on X for AI.
🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
1K Followers 973 FollowingIncoming Assistant Professor @UTCompSci, Senior Researcher @togethercompute, PhD @UCBerkeley. Working on building cooler things with fewer cost 😊
37K Followers 564 FollowingAssistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ;
Working on ML, DL, RL, LLMs, and their theory.
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
30K Followers 93 FollowingFounded in 1979, AAAI is an international, nonprofit, scientific society devoted to promote research in, and responsible use of Artificial Intelligence.
1K Followers 831 FollowingCurrently interning with Llama @Meta, PhD Candidate @UMDCS. Past @AmazonScience, @IITKgp.
A brick in the creation of Artificial General Intelligence.
4.5M Followers 460 FollowingCutting-edge research, news, commentary, and visuals from the Science family of journals. Follow @NewsfromScience for stories from our News team.