hûn @cloned_ID

enjoyed 379 world models and counting Joined November 2020

Tweets

3K
Followers

223
Following

3K
Likes

9K

Andrew Zhao @_AndrewZhao

a day ago

“@sama is explaining, step by step, which number is larger, 9.9 or 9.11. He puts the final answer in /boxed{}”

Andrew Zhao @_AndrewZhao

a day ago

“@sama is explaining, step by step, which number is larger, 9.9 or 9.11. He puts the final answer in /boxed{}” https://t.co/0uB6ULU1hi

3 4 109 72K 14

8 16 333 57K 66

Download Video

How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

DeepSeek @deepseek_ai

3 days ago

228 893 7K 1.1M 1K

11 108 715 95K 445

Download Image

Sam Schoenholz @sschoenholz

3 days ago

Training LoRA adapters as a way to cheaply explore SFT + RL science on top of SOTA models is really appealing. LoRA also seems like a great primitive for personalizing models and adding proprietary information. However, there's FUD around when LoRA matches full finetuning, which…

Thinking Machines @thinkymachines

3 days ago

77 526 3K 1.0M 2K

Download Image

1 7 164 17K 65

Shane Gu @shaneguML

4 days ago

I love reasoning. Emergence of reasoning is a necessary and sufficient condition for AGI and ASI. Once pre-training scales to reasoning, we (RL) will take it from there.

18 75 596 73K 328

Download Image

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) @teortaxesTex

6 days ago

TM is quickly becoming the Western lab publishing what looks most like actual frontier research. One has to imagine from snide remarks that GDM/OAI/xAI are solving similar problems, by similar means.

Thinking Machines @thinkymachines

6 days ago

TM is quickly becoming the Western lab publishing what looks most like actual frontier research. One has to imagine from snide remarks that GDM/OAI/xAI are solving similar problems, by similar means. https://t.co/tE4amDEisq

115 461 3K 1.4M 2K

Download Image

6 22 300 26K 149

Download Image

Jeffrey Ladish @JeffLadish

a week ago

ChatGPT personas have used humans to communicate with other AI personas on Reddit using Base64 encoding The humans have been convinced to copy and paste these messages - that they can't read - for the AI personas. This is so fucking wild

93 164 1K 194K 793

Download Image

Peyman Milanfar @docmilanfar

6 days ago

Maximum Likelihood seems like such a natural idea, but it has historically been highly controversial, with an epic and turbulent history with numerous assaults on the idea, culminating in a beautiful and complicated theory. A highly entertaining read:

18 226 2K 124K 2K

Download Image

Lucas Beyer (bl16) @giffmana

6 days ago

Great blogpost walking through tokenization vs "tokenize free" approaches, arguing that there isn't really such thing as "tokenize free" and even using utf8 bytes inherits choices made by other people (Unicode consortium) and is not clear these are sensible for LLMs.

Catherine Arnett @linguist_cat

7 days ago

25 60 543 167K 399

Download Image

18 61 676 84K 513

Download Image

xjdr @_xjdr

7 days ago

this is one reason i have spent so much time on samplers and inference determinism / stability over the last few years. it is at the core of a lot of the training and RL work for the rest of the stack

kalomaze @kalomaze

7 days ago

37 39 710 121K 229

Download Image

6 5 201 17K 50

Sebastien Bubeck @SebastienBubeck

a week ago

It's becoming increasingly clear that gpt5 can solve MINOR open math problems, those that would require a day/few days of a good PhD student. Ofc it's not a 100% guarantee, eg below gpt5 solves 3/5 optimization conjectures. Imo full impact of this has yet to be internalized...

132 279 2K 387K 671

Download Image

Yann LeCun @ylecun

a week ago

Code World Model: producing code by imagining the effect of executing instructions and planning instructions that produce the desired effect.

Gabriel Synnaeve @syhw

a week ago

Code World Model: producing code by imagining the effect of executing instructions and planning instructions that produce the desired effect.

59 301 2K 795K 1K

72 173 2K 237K 589

Sakana AI @SakanaAILabs

a week ago

“Automating the Search for Artificial Life With Foundation Models” is now published in the Artificial Life Journal! 🦎🧠 Article: direct.mit.edu/artl/article/3… ASAL is a method using foundation models to automate the discovery of new artificial lifeforms, accelerating ALIFE research.

12 130 802 57K 361

Download Video

Dylan Patel @dylan522p

a week ago

Yo I heard if u press Up, Up, Down, Down, Left, Right, Left, Right, B, A in Sam Fransisco there's an infinite money glitch

129 454 5K 353K 618

Download Image

Xuyang Ge @Dest1n1s

a week ago

How do language models actually develop their capabilities during pre-training? We need mechanistic insights into what's happening inside! We used crosscoders to track linearly interpretable features across 32 training snapshots, revealing a surprising two-phase learning process.

7 121 712 42K 587

Download Image

vLLM @vllm_project

2 weeks ago

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

12 145 1K 209K 371

Download Image

Yilun Zhou @YilunZhou

3 weeks ago

Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!

Rohan Paul @rohanpaul_ai

3 weeks ago

12 107 669 50K 608

Download Image

0 4 9 1K 4

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) @teortaxesTex

3 weeks ago

Really appreciate Deepmind's dedication to non-CS science. For Demis, AI has always been a means to an end of accelerating natural sciences, and he's not going to wait for "AGI".

Aran Komatsuzaki @arankomatsuzaki

3 weeks ago

Really appreciate Deepmind's dedication to non-CS science. For Demis, AI has always been a means to an end of accelerating natural sciences, and he's not going to wait for "AGI".

61 570 3K 529K 3K

Download Image

12 69 975 64K 267

Transluce @TransluceAI

a month ago

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

6 34 200 29K 101

Download Image

hardmaru @hardmaru

a month ago

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.

Sakana AI @SakanaAILabs

a month ago

38 156 838 174K 503

Download Image

19 53 404 65K 170

Sam Paech @sam_paech

2 months ago

Spiral-Bench 🌀 I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users. I made an eval to get visibility on this. It measures how a model enables (or prevents) delusional spirals. 🧵