-
Tweets105
-
Followers226
-
Following40
-
Likes66
Many people assume that LRM reasoning breaks down past a certain "complexity" or "number of steps" threshold. This is incorrect. It breaks down past an unfamiliarity threshold. And that threshold is very low. There is no limit to the complexity of tasks you can solve with these…
Whitespace-ignoring tokenization is an fundamental feature of Sentenepiece, implemented since its early stages (around 2017) Using whitespace yielded better results on MT. It would be helpful if you could mention this. github.com/google/sentenc…
Whitespace-ignoring tokenization is an fundamental feature of Sentenepiece, implemented since its early stages (around 2017) Using whitespace yielded better results on MT. It would be helpful if you could mention this. github.com/google/sentenc…
They totally didn't compile it
Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦⬛
Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases. Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
sneak preview 🍿 of our new embedding model: cde-small-v1 cde-small-v1 is the text embedding model that we (@srush_nlp and i) have been working on at Cornell for about a year tested the model yesterday on MTEB, the text embeddings benchmark; turns out we have state-of-the-art…
From Deep Learning with Python: - Deep learning research is an evolution process driven by poorly-understood empirical results - Math in DL papers is usually worthless and was placed there purely as a signal of seriousness - The key to good research is understanding what…
From Deep Learning with Python: - Deep learning research is an evolution process driven by poorly-understood empirical results - Math in DL papers is usually worthless and was placed there purely as a signal of seriousness - The key to good research is understanding what…
It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models. This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end) The…
There's a big difference between solving a problem from first principles vs applying a solution template you previously memorized. It's like the difference between a senior software engineer and a script kiddie that can't code. A script kiddie that has a gigantic bank of scripts…
There's a big difference between solving a problem from first principles vs applying a solution template you previously memorized. It's like the difference between a senior software engineer and a script kiddie that can't code. A script kiddie that has a gigantic bank of scripts…
When I left Meta, I told Mark that using AI, the Infra org could easily be reduced to 1/10 of their current size while doing a better job (*). Large companies are like fast-food joints. You need lots of processes to create a bland product with consistent quality with average…
When I left Meta, I told Mark that using AI, the Infra org could easily be reduced to 1/10 of their current size while doing a better job (*). Large companies are like fast-food joints. You need lots of processes to create a bland product with consistent quality with average…
one of the most important things I know about deep learning I learned from this paper: "Pretraining Without Attention" this what I found so surprising: these people developed an architecture very different from Transformers called BiGS, spent months and months optimizing it and…
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
> superhuman performance > more than 20x cheaper than a human This is the state of AI, and it’s just getting started
this 30-min-read blog post on how to craft and generate a 25B+ tokens synthetic text dataset distills more information and alphas than a typical NeurIPS best paper
Today we are announcing the release of a new model, Genstruct 7B. Genstruct 7B is an instruction-generation model, designed to create valid instructions given a raw text corpus. This enables the creation of new, partially synthetic instruction finetuning datasets from any…
Love this post more than other DL ones
Love this post more than other DL ones

xbott @xungm189
44 Followers 613 Following
Mwalni @Mwalni13526
0 Followers 91 Following
AudreyRuskin @a3Yf22KM49tkn
0 Followers 234 Following
EnidGarcia @g7UtVOP94juno
0 Followers 343 Following
Woufoot @Woufoot1694119
9 Followers 1K Following
SupportResist🇺🇸 @Eageaho0566250
47 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
Zhiwei Li @lzwjava
404 Followers 7K Following Zhiwei Li | Full-Stack AI Software Engineer | Read 320+ Books | Java Spring MySQL Redis JavaScript iOS Android Vue | Azure AWS GCP Alibaba Cloud | PyTorch CUDA
Mr Chart @Khollov23
117 Followers 4K Following I play game of probability where risk management plays the key role “not a financial advisor
Buvi @Buvi660
7 Followers 194 Following 💸 stock investing lover, independent girl! open to investment advice. DM me about market indices! 🌟 #Trading #Finance
3 stones @gaoleilucky
5 Followers 274 Following
DividendMillennial @Kho1ov23
360 Followers 5K Following Stocks, Dividends and Passive Income. Building a community where we can retire early together! Financial Information, Not financial advice
runes780 @runes780
78 Followers 1K Following
Raychan🌹🌏🤖 @AuroraRaychan
50 Followers 237 Following Cybernetist, Cosmopolite, Luxemburgism, Democratic Socialism, Studying@ Harbin Insititute of Technology, Majoring in Information Managment & Information System.
Alexander Doria @Dorialexander
19K Followers 4K Following Reasoning models to come. Co-founder @pleiasfr
Yanqun Sun @sunhanlp
0 Followers 10 Following
Luanyu_X @luanyu_x
4 Followers 127 Following
Xiaopeng Li @XiaopengLi2024
8 Followers 572 Following
Huizhi Xu @Audrey_Shanghai
0 Followers 12 Following
Johnson St. @st395205
1 Followers 65 Following
鹏大侠 @impenghero
10 Followers 193 Following
IceiceLi @af1ash
0 Followers 45 Following
Chen Lin @ChenLin99777854
8 Followers 25 Following AI in security at Bytedance; Ph.D. in Computer Science & Informatics.
fu lin @xulinfu361
4 Followers 257 Following
好好技术 @haohaojishu
49 Followers 548 Following
frank @spaghettiengine
197 Followers 2K Following
elonzh @elon2h
7 Followers 246 Following
John Smith @daidai85359059
0 Followers 166 Following
ligo surf @ligosurf
149 Followers 1K Following
Kiara MH Liu @kiara_mh_liu
27 Followers 171 Following Year 1 Information Science PhD @ Cornell | '23 BA CS/East Asian Studies @ Wellesley | *still* obsessed with ancient chinese fantasy gays, 6 years and counting
Miranda Zhu 筱萌 @XiaomengMZhu
142 Followers 292 Following PhD student in Linguistics @Yale @WuTsaiYale | @Wellesley ‘23 she/她/ella; go by Miranda
darthy @geekDarthy
152 Followers 1K Following Machine learning researcher in deep learning, computational probability, inference, and causality.
SVT @swim_vs_tide
133 Followers 1K Following What else do you need to know, you knew already everything...
v1otusc @v1otusc_zzH
82 Followers 3K Following
Elvins @elvins_yang
3 Followers 45 Following
HygeiaCC @CcHygeia
17 Followers 38 Following
哦! @wdlwdlwdl
6 Followers 36 Following
Mr Chart Norris @kholov23
17K Followers 149 Following I play the game of probability where risk management plays the key role “not a financial advisor “ @23wallstreet @xnews23
AlienOvichO @AlienOvichO
5K Followers 1K Following 📈 Technical Analyst at EWF 🔗14-Days Trial https://t.co/DuQgn2BRGd
Wicked Stocks @wickedstocks
6K Followers 159 Following Classic Chart Analysis | +25 Years of #StockMarket Experience Follow us for stock market tips & trade ideas
StockWhale @thestockwhale
78K Followers 99 Following Top Leaderboard Trader. Join Discord • https://t.co/GaBnArAAKe
Alexander Doria @Dorialexander
19K Followers 4K Following Reasoning models to come. Co-founder @pleiasfr
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
Wakana_official @Wakana1210staff
16K Followers 0 Following Wakana公式X。スタッフと本人が更新します。本人Instagram → wakana1210_ official 💿 2023年5/31(水) オリジナル3rd Album「そのさきへ」リリース🎤
DAIR.AI @dair_ai
79K Followers 1 Following Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
Technology Innovation... @TIIuae
16K Followers 156 Following Do you believe in a better tomorrow? We do. Our team of expert researchers live the dream and work to build it every day.
Fate/strange Fake @FateSF_Anime_EN
12K Followers 12 Following The official English Twitter account of Fate/strange Fake
Together AI @togethercompute
50K Followers 387 Following AI pioneers train, fine-tune, and run frontier models on our GPU cloud platform.
Miranda Zhu 筱萌 @XiaomengMZhu
142 Followers 292 Following PhD student in Linguistics @Yale @WuTsaiYale | @Wellesley ‘23 she/她/ella; go by Miranda
キギノビル @sinsin08051
483K Followers 783 Following イラストレーター/コンセプトアート等。 お仕事のご相談はこちら→https://t.co/3H4Vc41DnB
『魔法使いの夜... @mahoyo_movie
62K Followers 2 Following 『魔法使いの夜』 劇場アニメーション化決定 原作:奈須きのこ・TYPE-MOON アニメーション制作:ufotable #魔法使いの夜
松田るか @imrukaM
193K Followers 184 Following
EMNLP 2025 @emnlpmeeting
15K Followers 50 Following EMNLP 2025 - The 2025 Conference on Empirical Methods in Natural Language Processing, 2025 Hashtag: #EMNLP2025 Dates: November 5-9 Submission Deadline: May 19th
Emory University @EmoryUniversity
63K Followers 784 Following Official account for Emory, a private research university of international reach where courageous ideas achieve positive transformation in the world.
Hang Jiang @hjian42
2K Followers 2K Following 𝘽𝙪𝙞𝙡𝙙𝙞𝙣𝙜 𝙂𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙫𝙚 𝘼𝙄 𝙛𝙤𝙧 𝙖 𝙃𝙚𝙖𝙡𝙩𝙝𝒚 𝒂𝒏𝒅 𝘾𝙤𝙣𝙣𝙚𝙘𝙩𝙚𝙙 𝙎𝙤𝙘𝙞𝙚𝙩𝙮 | PhD @MIT ’25 | Incoming Asst. Prof. @Northeastern
ReoNa @xoxleoxox
317K Followers 621 Following 1998/10/20(Lv.26)▶︎ #ReoNa #神崎エルザ ▷ #声日記 #こえにっき ▷7/24〜アコースティックツアー #ふあんぷらぐど ▶︎8/6(水) 11th Single「End of Days」
[email protected] @ddvd233
20K Followers 1K Following PhD Student @MIT Media Lab | Multimodal LLMs | MS in Computer Science @Stanford, RA at @StanfordSVL supervised by @drfeifei | 艾默里归宅部荣誉部员|日本語本当下手
Hejie Cui @HennyJieCC
2K Followers 1K Following Postdoc Scholar @Stanford; CS PhD @EmoryUniversity; EECS Rising Star; previously @MSFTResearch, @AmazonScience; #machinelearning #datamining #AI4Health; She/her
PyTorch @PyTorch
451K Followers 77 Following Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation
Thomas Wolf @Thom_Wolf
95K Followers 6K Following Co-founder at @HuggingFace - open-source and open-science
Jiaying Lu @LuJiaying_AI
95 Followers 100 Following CS Ph.D. from Emory University. Former ASU visiting student @ApgAsu.
Emory LGS @laneygradschool
2K Followers 345 Following Programs across the full range of arts & sciences, biomedical sciences, public health sciences, business and nursing.
Fate/stay night USA @FateStayNightUS
16K Followers 4 Following The official U.S. Twitter of Fate/stay night anime series
STEINS;GATE TVアニ�... @SG_anime
88K Followers 41 Following 「STEINS;GATE」アニメシリーズの様々な最新情報をお届け! TVアニメ「シュタインズ・ゲート」「シュタインズ・ゲート ゼロ」 /劇場アニメ「シュタインズ・ゲート 負荷領域のデジャヴ」 フルアニADV「STEINS;GATE ELITE」も好評発売中!
shuyo @shuyo
2K Followers 150 Following 『図解即戦力 ChatGPTのしくみと技術がこれ1冊でしっかりわかる教科書』『わけがわかる機械学習』『ぺたスクリプト』 / サイボウズ・ラボの文芸部員 / 自然言語処理 / 岩波データサイエンス / iVoca
Fate/stay night @Fate_SN_Anime
480K Followers 18 Following Project「Fate/stay night」公式アカウント。 公式ハッシュタグ⇒ #fate_sn_anime https://t.co/CUyTicm62S HF最終章BD&DVD2021年3月31日発売。