Jason Lee @jasondeanlee
Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning jasondlee88.github.io Princeton. NJ Joined October 2018-
Tweets1K
-
Followers10K
-
Following3K
-
Likes3K
Looks neat!
I'm thrilled to join Princeton's faculty as an assistant professor in the ECE department starting Fall 2025 🐯 Stay tuned for the launch of my lab. We will develop generally helpful robots that learn and plan 🤖
LLaMA-3 is a prime example of why training a good LLM is almost entirely about data quality… TL;DR. Meta released LLaMA-3-8B/70B today and 95% of the technical info we have so far is related to data quality: - 15T tokens of pretraining data - More code during pretraining…
Bigger and better data!
Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)
Great analysis, approach 3 is finally in agreement! The loss scale was too low in our paper, resulting in premature termination of L-BFGS, and leading to bad fits. After fixing this we can reproduce your findings! We're also open sourcing the data in the paper, stay tuned :)
I'm thrilled to see that our work has apparently unified the Chinchilla scaling laws. What's even more exciting is that they're making the data open source!
Now that it is official, my amazing student Chris Harshaw is joining the statistics department @Columbia as an Assistant Professor. Super proud of him. chrisharshaw.com
@RuiqiZhang0614 @yubai01 Advertise here again for the postdoc position!
Treat dataset need to forget as negative data in preference data. Smart application of DPO
Treat dataset need to forget as negative data in preference data. Smart application of DPO
While digging up some historical numbers, it hit me that LLM training is now ~10X faster than same time last year from tons of improvements like H100 availability, Flash Attention2, new kernels, torch.compile, CUDA graphs, FP8 etc! That's just past 12 months!!
10, duh
how rude of them to contradict other people's research publicly like that (interesting work! for those who thought "scaling laws" are a thing)
how rude of them to contradict other people's research publicly like that (interesting work! for those who thought "scaling laws" are a thing)
To all the defeatists who think there is nothing else but scale: * 5 years between Self-Attention Is All You Need and FlashAttention * Transformers still require warmup. Researchers: get back to work! The future is bright :)
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
MEDUSA is a framework designed to accelerate inference in Large Language Models (LLMs) by: - Introducing multiple decoding heads. - Predicting subsequent tokens in parallel. It addresses the inefficiency in LLM inference caused by memory bandwidth limitations. 1/5
Learning Transformer Programs (arxiv.org/abs/2306.01128 from Princeton NLP) - This paper is neat. Modify transformer arch to be disentangled (concat not add, -residuals), anneal training to be discrete, convert to python code. Doesn't really scale yet but very fun.
New findings: We just evaluated Gemini 1.5 Pro on our recent benchmark that tests the impact of context size on reasoning performance - it is much better than 1.0 in long contexts! Though still falls behind GPT4. Also, CoT prompting now improves accuracy (unlike with 1.0). (1/4)
About 15% of all Stanford undergrads (of all majors) are learning how to build LLMs. Stats for @chrmanning's CS224N Natural Language Processing with Deep Learning below. That makes sense. LLM's are becoming a basic systems component like compute, networking and storage.
Is this not just saying that performance on LLM benchmarks improves with perplexity (you fit the data better)? Generative models are already trained to essentially compress their training set.
Is this not just saying that performance on LLM benchmarks improves with perplexity (you fit the data better)? Generative models are already trained to essentially compress their training set.
Clément Canonne @ccanonne_
31K Followers 925 Following Senior Lecturer @Sydney_Uni. Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @[email protected]Gautam Kamath @thegautamkamath
44K Followers 502 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.Yi Ma @YiMaTweets
71K Followers 120 Following Chair Professor in AI, Director of IDS, Head of CS, HKU; Professor of EECS, Berkeley; Author of Book: High-Dim Data Analysis, https://t.co/gwaqMJp8av.Dan Roy @roydanroy
45K Followers 2K Following Research Director, @VectorInst. Canada CIFAR AI Chair. Associate Professor of Stats/CS @UofT. I study machine learning and AI, emphasis on theory.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistAmin Karbasi @aminkarbasi
8K Followers 2K Following Associate Professor at Yale University, staff research scientist at Google.Behnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingBen Recht @beenwrekt
26K Followers 361 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.Kyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Csaba Szepesvari @CsabaSzepesvari
8K Followers 699 Following "If there is not folly in the world, then the world itself is folly. You must understand that mistakes are not always regrets." - Paul Tobin, Bandette🤠Dimitris Papailiopoul.. @DimitrisPapail
11K Followers 957 Following prof @ wisconsin; thinking about transformers; learning in context; babas of Inez LilyMaxim Raginsky @mraginsky
8K Followers 2K Following father, academic, raconteur, aging wannabe hipster blog: https://t.co/akk6LCvKw6Boaz Barak @boazbaraktcs
17K Followers 415 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.Rosanne Liu @savvyRL
32K Followers 965 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRAlex Dimakis @AlexGDimakis
13K Followers 2K Following UT Austin Professor. Researcher in Machine Learning and Information Theory. National AI Institute on the Foundations of Machine Learning (IFML) Co-director.Tom Goldstein @tomgoldsteincs
23K Followers 2K Following Professor at UMD. AI security & privacy, algorithmic bias, foundations of ML. Follow me for commentary on state-of-the-art AI.Sam Power @sp_monte_carlo
16K Followers 7K Following Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him)Peter Richtarik @peter_richtarik
6K Followers 591 Following Federated Learning Guru. Tweeting since 20.5.2020. Lived in 🇸🇰🇺🇸🇧🇪🇬🇧🇸🇦Eugene Vinitsky @EugeneVinitsky
13K Followers 2K Following Anti-cynic. Artificial narrow intelligence. Autonomous vehicles, multi-agent learning, and transportation. RS at Apple, Asst. Prof at @nyutandon. He/him.Mingwei Liu @mliu918351
199 Followers 717 Following Software Engineering Researcher, Fudan University, Postdoctoral ResearcherInformation MDPI @InformationMDPI
2K Followers 2K Following Information (ISSN 2078-2489, #Scopus, #ESCI, #EI Compendex) is an open access journal of information science and technology, data, knowledge and communication.Klee @xiaonando
1 Followers 22 FollowingJHU CLSP @jhuclsp
5K Followers 661 Following Center for Language and Speech Processing at @JohnsHopkins #NLProc #MachineLearning #AI https://t.co/6IXR5OSiDY @[email protected]Mufan (Bill) Li @mufan_li
791 Followers 489 Following Postdoc @Princeton ORFE | Prev: PhD @UofTStatSci @VectorInstTimothy C.Bohen 🇱�.. @tbhoen__
235 Followers 1K Following Dedicated family man, entrepreneur, Bullider, financial contrarian and writer.Jian-tao Xiao @aabrielian
242 Followers 1K Following ⁰00000000000⁰⁰0000000000000000000000000000000000000000000⁰0000000000000000⁰⁰⁰00000000000000000000000000000000000000000000⁰00000000000000⁰000000¹0000000000¹0001Rag @Beingsissyphus1
0 Followers 63 FollowingZzZzZ @zdmc23
183 Followers 5K Following 🌱🧬 investing in longevity... in the metaverse 👾 and also the universe 🌌 1st non-longevity check @fluffyvectors bc they are doing cool things w agent memoryJJ McCammon @jjmccammon
357 Followers 4K Following Formerly @Microsoft. Host of the #1 AI podcast in Harlem, NY according to my mom. Liked tweets ≠ endorsement. Newsletter: https://t.co/rwxxpe4HZO.Vikram Dutt @vd_
782 Followers 6K FollowingTatsuya Shirakawa @s_tat1204
1K Followers 785 Following Human / 合同会社nouu代表 機械学習や数理最適化、データサイエンスの周辺領域に生息しています。 アドバイザリーや技術支援もしています。 お仕事の依頼/ご相談はお気軽にどうぞ。Khalid Al-Otaibi @KhaledAlShybani
39 Followers 438 FollowingAnshuman @anshuman_kalra
8 Followers 253 FollowingManish kumar @Manishk34591491
79 Followers 922 FollowingBhavin Jawade @BhavinJawade
397 Followers 3K Following Ph.D Candidate at University at Buffalo @UBuffalo | Research Scientist Intern @Yahoo | Ex. Research Scientist Intern at Adobe Research @AdobeKirito (e/acc) 🏴�.. @bronzeagepapi
3K Followers 5K Following engineer scientist artist –– moloch disrespectoor // qualia connoisseur // tensor whisperer // epistemology enjoyer // kardashev mechanic // bounty hunterAnubrata Das @d_anubrata
579 Followers 2K Following Ph.D. Candidate @UTiSchool I think about Human-Centered NLP | #NLP + #HCISasha Rush @srush_nlp
51K Followers 463 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzHyeonbin Hwang @ronalhwang
143 Followers 197 Following M.S. Student @kaist_ai https://t.co/bQW6mlGzDNJason M @JasonM944
3 Followers 42 Followingrambalo987 @rambalo987
7 Followers 60 FollowingIves Macedo @ivesmacedo
141 Followers 3K Followingララどり f/acc @presklux49
132 Followers 545 Following ISTP-T / シンギュラリティを考え続ける思想系のフクロウ 大事なことワクワクすることも口にくわえて運んでまいります ラララ~と歌って暮らせる未来を願いますXiyang Wu @wu_xiyang
255 Followers 869 Following Ph.D. Student at @gammaumd @eceumd @umiacs @UofMaryland. Previous: @EmoryUniversity @GeorgiaTech @TJU1895.boluo @neobyd
50 Followers 378 Followingshunni654 @shunni65492418
20 Followers 817 Followingus_Meadow_ @UMeadow51770
8 Followers 1K FollowingPeter @Jolene45697641
132 Followers 3K FollowingAI Papers Podcast @aipaperspodcast
820 Followers 2K Following A digestible daily update on the latest AI Research Papers. Brought to you by @pocketpodappWenhao Zhan @zhan_wenhao
7 Followers 21 Following PhD Student @ Princeton University • Theoretical Reinforcement Learning • Previously BS @ Tsinghua Universityhua @james1024y
0 Followers 136 FollowingPratyush Maini @pratyushmaini
1K Followers 335 Following Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhiidol2vec @deugledeugle
127 Followers 1K FollowingClément Canonne @ccanonne_
31K Followers 925 Following Senior Lecturer @Sydney_Uni. Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @[email protected]Gautam Kamath @thegautamkamath
44K Followers 502 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.Yi Ma @YiMaTweets
71K Followers 120 Following Chair Professor in AI, Director of IDS, Head of CS, HKU; Professor of EECS, Berkeley; Author of Book: High-Dim Data Analysis, https://t.co/gwaqMJp8av.Dan Roy @roydanroy
45K Followers 2K Following Research Director, @VectorInst. Canada CIFAR AI Chair. Associate Professor of Stats/CS @UofT. I study machine learning and AI, emphasis on theory.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistAmin Karbasi @aminkarbasi
8K Followers 2K Following Associate Professor at Yale University, staff research scientist at Google.Behnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingBen Recht @beenwrekt
26K Followers 361 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.Kyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Jelani Nelson @minilek
22K Followers 184 Following Professor @Berkeley_EECS. Research Scientist (part-time) @GoogleAI. Founder @addiscoder. 🇻🇮🇺🇸🇪🇹Csaba Szepesvari @CsabaSzepesvari
8K Followers 699 Following "If there is not folly in the world, then the world itself is folly. You must understand that mistakes are not always regrets." - Paul Tobin, Bandette🤠Dimitris Papailiopoul.. @DimitrisPapail
11K Followers 957 Following prof @ wisconsin; thinking about transformers; learning in context; babas of Inez LilyMaxim Raginsky @mraginsky
8K Followers 2K Following father, academic, raconteur, aging wannabe hipster blog: https://t.co/akk6LCvKw6Jia-Bin Huang @jbhuang0604
51K Followers 285 Following Associate Professor @umdcs; Part-time Research Scientist @Meta. I like pixels.Boaz Barak @boazbaraktcs
17K Followers 415 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.Rosanne Liu @savvyRL
32K Followers 965 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRNeurIPS Conference @NeurIPSConf
111K Followers 35 Following New Orleans, Dec 10-16, 23. https://t.co/ga8aOw615g Tweets to this account are not monitored. Please send feedback to [email protected].Alex Dimakis @AlexGDimakis
13K Followers 2K Following UT Austin Professor. Researcher in Machine Learning and Information Theory. National AI Institute on the Foundations of Machine Learning (IFML) Co-director.Mufan (Bill) Li @mufan_li
791 Followers 489 Following Postdoc @Princeton ORFE | Prev: PhD @UofTStatSci @VectorInstSebastian Borgeaud @borgeaud_s
938 Followers 259 Following Research Engineer at DeepMind with a focus on Large Language Models and large scale Deep LearningBrenden Lake @LakeBrenden
7K Followers 195 Following Assistant Professor of Psychology and Data Science @ NYU. Co-Director of the NYU Minds, Brains, and Machines Initiative.Div Garg @DivGarg9
17K Followers 99 Following Working on breaking things @MultiON_AI | RL + AI researcher | Adjunct Lecturer @Stanford CS | worked @nvidia Research, @apple SPG, @GoogleAI, @UberATGGeorge @georgejrjrjr
2K Followers 792 Following The timeline vibetimes pipeline to things still more strange and enticing.Tamay Besiroglu @tamaybes
3K Followers 718 Following Thinking about economics, computing and machine learning @EpochAIResearch @MIT_CSAILArjun Panickssery is .. @panickssery
1K Followers 2K Following Researching scalable oversight @MATSprogram | prev @METR_Evals @ai_risks | spaced repetition | AI safety | https://t.co/p887k6EsFsGuido Appenzeller @appenz
7K Followers 198 Following At a16z investing in AI & Infra. 2x founder & CEO. CTO at Intel & VMware. CPO at Yubico. Tweets are my own.Luca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai, open source science fan, @QueerInAI organizer 🤖☕️🍕they/themHyeonbin Hwang @ronalhwang
143 Followers 197 Following M.S. Student @kaist_ai https://t.co/bQW6mlGzDNWill Merrill @lambdaviking
2K Followers 563 Following Ph.D. student @ NYU🗽 Theoretical aspects of NLP and LMs /nætʃɹəl/🇮🇸 + formal🤵 languages + TCS🧮Matthew Green @matthew_d_green
143K Followers 1K Following I teach cryptography at Johns Hopkins. Mastodon at [email protected] and BlueSky at https://t.co/GI4QlxYTdk.Adnan Masood @adnanmasood
1K Followers 1K Following Machine Learning PhD, Microsoft Regional Director & AI MVP, Software Engineer, Parent, Runner, Author, Speaker, Hacker, STEM Coach, Visiting scholar @ StanfordAhmad Al-Dahle @Ahmad_Al_Dahle
2K Followers 35 Following #Girldad of twins. Leading GenAI @ Meta (llama, imagine, meta ai and more)Interconnects @interconnectsai
2K Followers 1 Following What you need to know about AI research trends, from @natolambert Wednesday mornings weekly, sometimes extra posts.Mikel Artetxe @artetxem
6K Followers 220 Following Co-founder @RekaAILabs and Honorary Researcher @IxaGroup (University of the Basque Country) | Past: Research Scientist @AIatMeta (FAIR)Nathan Godey @nthngdy
520 Followers 839 Following 3rd year PhD student @InriaParisNLP Working on the representations of language models, architectures, and pretraining methodsLewis @ctjlewis
27K Followers 15K Following Founder @SpellcraftAI. Techno-optimist, e/acc ally. Prev @Walmart, @McDonalds. Almae matres @Wikipedia, @YouTube, @Coursera.Reka @RekaAILabs
10K Followers 13 Following An AI research and product company 🌟. We are a team of scientists and engineers building state-of-the-art multimodal language models 🪄.David Krueger @DavidSKrueger
13K Followers 4K Following Cambridge faculty - AI alignment, deep learning, and existential safety. Formerly Mila, FHI, DeepMind, ElementAI, AISI.Wen Sun @WenSun1
239 Followers 33 Following Assistant Professor at Cornell CS. Machine Learning and Reinforcement Learning; check out the RL Algorithm and theory book here https://t.co/HROGwaflCnH.-S. Philip Wong @HSPhilipWong
93 Followers 23 Following Willard R. and Inez Kerr Bell Professor at @StanfordEngMatthew Carrigan @carrigmat
3K Followers 351 Following @huggingface engineer. I'm the reason your LLM frontend has a jinja2cpp dependency. Sometimes yells about housing and trans rights instead of working He/himDaniel de Kadt @dandekadt
2K Followers 1K Following Social and data science at @LSEnews Democracy, behaviour, meta-science, 🇿🇦🇺🇲Andrew Gao @itsandrewgao
22K Followers 2K Following currently: @nomic_ai @stanford; prev @LangChainAI; Z Fellow 🇺🇸Ezra Klein @ezraklein
2.6M Followers 1K Following Columnist, @NYTOpinion Author, "Why We're Polarized" Host of "The Ezra Klein Show" podcastMax Marion @maxdoesresearch
312 Followers 98 Following my machine learning research account where i tell you abt all my sick experiments | pfp: me w/ https://t.co/XWwMkEg1a1 | personal account: @maxisawesome538Raunak Farhaz @Farhazraunak
92 Followers 183 Following Ph.D. @HumboldtUni @HumboldtChem| Theoretical Chemistry | Multiresolution Analysis in Quantum Chemistry, Strong Magnetic Field Stellar ChemistryTypst @typstapp
4K Followers 112 Following Compose papers faster: Focus on your text and let Typst take care of layout and formatting.Natalie Wolchover @nattyover
45K Followers 1K Following Physics journalist @QuantaMagazine. Pulitzer Prize for Explanatory Reporting. She/her. Kindness, integrity.Kyle🤖🚀🦭 @KyleMorgenstein
14K Followers 5K Following Full of childlike wonder. Teaching robots manners. Now: Boston Dynamics AI Institute, UT Austin PhD candidate. Past: NASA JPL, MIT ‘20.Pratyush Maini @pratyushmaini
1K Followers 335 Following Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhiSachin Goyal @goyalsachin007
761 Followers 714 Following PhD student @ CMU MLD || Microsoft Research || UG @ IIT BombayThomas Lin @7homaslin
10K Followers 1K Following Quanta Books; @QuantaMagazine founder / 1st EIC; former @nytimes, @ScienceWriting; books: (science) https://t.co/ZHAR1Pn4Bi + (math) https://t.co/5WSMip02t1Raj Dabre @prajdabre1
3K Followers 749 Following NLP/Machine Translation/NLG/Deep Learning. Researcher-@NICT_Publicity. Adjunct Faculty-@iitmadras. Visiting Professor-@iitbombay. Ex-@KyotoU_News. #nlprocZhibin Gou @zebgou
119 Followers 436 Following A second-year M.S. student at Tsinghua University; Research Intern @MSFTResearch. Recent works: CRITIC, ToRA, Rho-1Arnab Sen Sharma @arnab_api
149 Followers 83 Following Ph.D. student @KhouryCollege, working to make LLMs interpretableOmar Khattab @lateinteraction
11K Followers 2K Following CS PhD candidate @StanfordNLP. 2022 Apple Scholar in AI/ML. Author of ColBERT (https://t.co/2ZtgXoa1np), DSPy (https://t.co/BH7WmMKDXR), & various retrieval & LM systems.OlivierD @OlivierDehaene
101 Followers 9 FollowingGeorge Grigorev @iamgrigorev
2K Followers 524 Following formerly generative ml @ snap, global talent interested in llmsTheoretical Foundatio.. @tf2m_workshop
78 Followers 15 Following Workshop on Theoretical Foundations of Foundation Models @icmlconf 2024.Mariam Aly @mariam_s_aly
11K Followers 393 Following I study brains and sometimes use one. Faculty at @BerkeleyPsych starting July 2024.Michal Feldman @MichalFeldman9
2K Followers 241 Following Professor of Computer Science, @TelAvivUni | @AcmSIGecom Chair | Research areas: Econ&CS, Algorithmic Game Theory, Market DesignI have a really stupid question: Why is Zuck spending so much on gen AI? What’s his long term benefit from this?
Early 1K votes are in and Llama-3 is on FIRE!🔥The New king of OSS model? Vote now and make your voice heard! Leaderboard update coming very soon.
Big congrats to @AIatMeta on Llama 3 release🔥 A huge week for open-source AI! Both Llama-3 70B & 8B are now in the Arena thanks to @togethercompute fast support. Let's see how well it does in real-world tests by Arena power users, come challenge Llama-3🧩!
According to the license, you must name all models that use llama 3 in any way “LLaMa 3 XXX” llama.meta.com/llama3/license/ They don't say that you can't give your models nicknames though... "LLaMa 3 Robert Archibald Percival Fortescue Language Model" aka "BobLM"
I’m excited to announce that in July 2025 I will be joining @UWaterloo as an Assistant Professor in the Department of Statistics and Actuarial Science! Until then, I will continue at Princeton as a DataX Postdoc Fellow, working with Boris Hanin. I have many exciting projects…
#LLaMA3 is out! It's the same architecture as Llama-2, except for some differences: 1. 128K Tiktoken vocab vs 32K vocab of Llama-2 2. 15 Trillion tokens instead of 2T 3. 8 billion model uses GQA (unlike Llama 7b) 4. 8K Context Length 5. Chinchilla scaling laws - log linear gains!…
How do model components (conv filters, attn heads) collectively transform examples into predictions? Is it possible to somehow dissect how *every* model component contributes to a prediction? w/ @harshays_ @andrewilyas, we introduce a framework for tackling this question!…
Congrats to @AIatMeta on Llama 3 release!! 🎉 ai.meta.com/blog/meta-llam… Notes: Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ @lmsysorg :)) 400B is still training, but already encroaching…
LLaMA-3 is a prime example of why training a good LLM is almost entirely about data quality… TL;DR. Meta released LLaMA-3-8B/70B today and 95% of the technical info we have so far is related to data quality: - 15T tokens of pretraining data - More code during pretraining…
fixed the fixed fix for llama3
It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more…
@RuiqiZhang0614 @yubai01 Advertise here again for the postdoc position!
I got tenure! It was fitting that I got to celebrate with the lab right after the news. Working together for the last 6.5 years has been a blast.
The interesting part of the Mistral plot that everyone keeps sharing with their own models is equating activated params to cost as someone mentioned to me. Picking my hill for today!
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
New findings: We just evaluated Gemini 1.5 Pro on our recent benchmark that tests the impact of context size on reasoning performance - it is much better than 1.0 in long contexts! Though still falls behind GPT4. Also, CoT prompting now improves accuracy (unlike with 1.0). (1/4)
educate me: how/why did MMLU become the de-facto standard for measuring LLM performance?
About 15% of all Stanford undergrads (of all majors) are learning how to build LLMs. Stats for @chrmanning's CS224N Natural Language Processing with Deep Learning below. That makes sense. LLM's are becoming a basic systems component like compute, networking and storage.