Nguyễn Đức Ánh @duc_anh2k2

Joined May 2024

Tweets

43
Followers

1
Following

98
Likes

33

Sanjeev Arora @prfsanjeevarora

2 weeks ago

Nice cheat sheet for LLM terminology!

Ahmad @TheAhmadOsman

2 weeks ago

Nice cheat sheet for LLM terminology!

44 278 3K 248K 6K

1 3 20 6K 17

Sebastian Raschka @rasbt

a month ago

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

Sebastian Raschka @rasbt

2 months ago

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro… https://t.co/9vF9U29pWh

26 202 1K 469K 786

Download Image

61 552 4K 344K 4K

Download Image

Rohan Paul @rohanpaul_ai

2 months ago

🎯 Andrej Karpathy on how to learn.

93 558 5K 407K 4K

Download Image

Graham Neubig @gneubig

2 months ago

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

11 358 2K 117K 2K

what are large language models actually doing? i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work. here’s everything you need to know about llms in 3 minutes↓

77 940 7K 1.1M 18K

Download Image

Lorenzo Xiao @lrzneedresearch

3 months ago

As promised, my SOP draft is here: algoroxyolo.github.io/assets/pdf/lrz… Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research. As always RT appreciated!! #PhDApplication #NLP #HCI

1 4 23 3K 15

Ernest Ryu @ErnestRyu

3 months ago

New lecture recordings on RL+LLM! 📺 This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)

11 159 1K 134K 2K

Ricardo Buitrago @rbuit_

3 months ago

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

6 34 194 41K 118

Download Image

Zhengfu He @ZhengfuHe

5 months ago

Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N

5 81 521 42K 453

Download Image

Probability and Statistics @probnstat

6 months ago

Statistical Learning Theory by Percy Liang web.stanford.edu/class/cs229t/n…

0 85 594 28K 523

Download Image

Zhuang Liu @liuzhuang1234

7 months ago

New paper - Transformers, but without normalization layers (1/n)

76 598 4K 1.3M 2K

Download Image

אגי-e/acc @murage_kibicho

9 months ago

I've been reading this book alongside Deepseek. The math is mathing. The code is coding. The Deepseek is deepseeking! @deepseek_ai you made god!

10 26 422 25K 308

Download Image

Math Cafe @Riazi_Cafe_en

10 months ago

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311…

Math Cafe @Riazi_Cafe_en

11 months ago

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311… https://t.co/bJ097Zg52K

2 104 716 128K 542

Download Image

1 141 914 85K 1K

Download Image

Harrison Ritz @harrison_ritz

12 months ago

Excited to share a new project! 🎉🎉 doi.org/10.1101/2024.0… How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task? To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀 ⏬

7 108 521 42K 357

Download Gif

Abhinav Shukla @Abhinav95_

12 months ago

Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference. Train a single elastic model, get 100s of nested submodels for free! Paper: sca.fo/mmpaper Code: sca.fo/mmcode 🧵(1/10)

2 55 231 35K 128

Download Image

Rohan Paul @rohanpaul_ai

a year ago

A cool Github repo collecting LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

16 395 3K 198K 3K

Download Image

Khushi Agrawal @khushi__411

a year ago

Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!! Check out the links below...

9 64 393 39K 480

Download Image

Tom Yeh @ProfTomYeh

a year ago

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…

10 174 947 48K 604

Download Video

AK @_akhaliq

a year ago

EVLM An Efficient Vision-Language Model for Visual Understanding In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the

3 39 155 15K 64

Download Image

Shubhendu Trivedi @_onionesque

a year ago

Just got around to trying ColPali arxiv.org/abs/2407.01449 but for more general extraction tasks than poorly formatted/scanned documents with complicated SEC tables*. Pretty impressive! VLMs for efficient indexing and late interaction matching gives a sizeable boost.