-
Tweets74
-
Followers541
-
Following109
-
Likes1K
We are hiring research interns in the Llama team at Meta GenAI: metacareers.com/jobs/841036594… Drop me an email if you're interested in working on LLMs together! 🙂
Myth: open foundation models are antithetical to AI safety. Fact: open foundation models are critical for AI safety. Here are three reasons why:
@tegmark @RishiSunak @vonderleyen Altman, Hassabis, and Amodei are the ones doing massive corporate lobbying at the moment. They are the ones who are attempting to perform a regulatory capture of the AI industry. You, Geoff, and Yoshua are giving ammunition to those who are lobbying for a ban on open AI R&D. If…
How can we trust LM evals when LMs might be pretraining on the test set? We show you can prove suspected test set contamination on black-box models with false positive rate guarantees. An audit of 5 open LMs shows little evidence of strong contamination. arxiv.org/abs/2310.17623
Introducing CatGPT - cat-gpt-alpha.vercel.app. A bot to provide expert answers to all your cat-related questions. Something I've been doing as a learning exercise. Would love to hear any feedback!
We’re releasing the Anticipatory Music Transformer: a controllable generative model for symbolic music (like MIDI). Read about the model on the CRFM blog: crfm.stanford.edu/2023/06/16/ant… 🧵👇
Fine-tuned performance without a step of SGD? Excited to share TART, which transplants transformer-based reasoning modules on arbitrary foundation models to improve in-context learning performance! 📜 arxiv.org/abs/2306.07536 💻 github.com/HazyResearch/T… ✍️ hazyresearch.stanford.edu/blog/2023-06-1…
OUT. OF. THIS. WORLD. 🤯 @janniksin @carlosalcaraz #MiamiOpen
We know that language models (LMs) reflect opinions - from internet pre-training, to developers and crowdworkers, and even user feedback. But whose opinions actually appear in the outputs? We make LMs answer public opinion polls to find out: arxiv.org/abs/2303.17548
Language models are becoming the foundation of language technologies, but when do they work or don’t work? In a new CRFM paper, we propose Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of LMs. Holistic evaluation includes three elements:
New paper with Peter Bartlett and @obousquet called "The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima": arxiv.org/abs/2210.01513.
LLMs can do in-context learning, but are they "learning" new tasks or just retrieving ones seen during training? w/ @shivamg_13, @percyliang, & Greg Valiant we study a simpler Q: Can we train Transformers to learn simple function classes in-context? 🧵 arxiv.org/abs/2208.01066
Does language supervision (as in CLIP) help vision models transfer better? You might expect a clear-cut answer: 'captions always help' or 'not at all'. But w/ @yanndubs @rtaori13 @percyliang @tatsu_hashimoto, we find that the picture is nuanced.🧵 arxiv.org/abs/2207.07635
🧵Fields medalist June Huh shares an early math experience: a chess puzzle in the game "The 11th Hour." Story and figures from nytimes.com/live/2022/07/0…. Can you swap the positions of the black and white knights? Seems hard, right? A new perspective makes it almost trivial! 1/n
Interpolation (train to zero loss) often does well in high dim, yet may still be undesirable (e.g., security/privacy concerns). So is interpolation necessary for optimal generalization? In our COLT paper, we surprisingly find the answer is yes! arxiv.org/abs/2202.09889 (1/n)
WHAT. JUST. HAPPENED. @alcarazcarlos03 @steftsitsipas #MiamiOpen
How can neural nets trained by gradient descent manage to interpolate noisy training data and simultaneously generalize near-optimally? In new work with @niladrichat and Peter Bartlett, we characterize a 'benign overfitting' phenomenon in 2-layer nets: arxiv.org/abs/2202.05928
Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistJim Fan @DrJimFan
229K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Ananya Kumar @ananyaku
4K Followers 469 Following Researcher at @openai Previously PhD at Stanford University (@StanfordAILab) advised by Percy Liang and Tengyu MaBehnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingrishi @RishiBommasani
4K Followers 2K Following Stanford CS PhD @StanfordCRFM @StanfordNLP @StanfordAILab @StanfordHAI Advisers: @percyliang @jurafsky Previous: @CornellCIS @clairecardie #FoundationModelsMaxim Raginsky @mraginsky
8K Followers 2K Following father, academic, raconteur, aging wannabe hipster blog: https://t.co/akk6LCvKw6Yann Dubois @yanndubs
4K Followers 1K Following PhD student @stanfordAILab | Prev: AI resident @metaai, @vectorinst, @CambridgeMLGJason Lee @jasondeanlee
10K Followers 3K Following Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learningTengyu Ma @tengyuma
25K Followers 512 Following Assistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ; Working on ML, DL, RL, LLMs, and their theory.Jeremy Cohen @deepcohen
4K Followers 867 Following PhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.Gautam Goel @gautamcgoel
4K Followers 413 Following Postdoc studying ML at the Simons Institute at UC Berkeley.Preetum Nakkiran @PreetumNakkiran
10K Followers 2K Following ML research @Apple. @sh_reya’s fiancé | PhD @Harvard, postdoc @UCSanDiego, EECS @Berkeley_EECS, "AI" @OpenAI, @GoogleAISachin Goyal @goyalsachin007
764 Followers 714 Following PhD student @ CMU MLD || Microsoft Research || UG @ IIT BombaySam Power @sp_monte_carlo
17K Followers 7K Following Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him)Aldo Pacchiano @aldopacchiano
1K Followers 414 Following AI research at Broad Institute and Boston University 🇲🇽Hossein Mobahi @TheGradient
6K Followers 691 Following Senior Research Scientist @GoogleAI. I ∈ Optimization ∩ Machine Learning. Fan of @IronMaiden🤘.Here to discuss research 🤓Andrej Risteski @risteski_a
3K Followers 2K Following Machine learning researcher. Assistant Professor, ML department at CMU (@mldcmu).tangbinh @tangbinh2
38 Followers 118 Followinglovish @louvishh
313 Followers 603 Following phding @ucl and @aiatmeta (llama team). mostly random tweets here.Martin Fan @perfectoid_ai
394 Followers 7K FollowingAbhigyan Bhusal @AbhigyanBhusal
41 Followers 160 FollowingBhavya Vasudeva @bhavya_vasudeva
38 Followers 230 Following PhD candidate @CSatUSC | @iitroorkee'20 | Interested in theory of deep learning, optimization and robustness/generalization | she/herSophia @sopharicks
675 Followers 1K Following Former ballerina turned AI writer& communicator. OpenAI alumni. Fan of astrophysics, open-source, conversations about singularity. Founder of BuzzRobot.Archit Sharma @archit_sharma97
4K Followers 340 Following Final-year CS PhD student @Stanford. Previously, AI Resident @Google Brain, undergraduate @IITKanpur, research intern @MILAMontreal.Xinliang (Frederick) .. @FrederickXZhang
175 Followers 190 Following AI PhD @UMichCSE (@launchnlp & @michigan_AI). Research in #NLProc. Intern @Adobe & @Bloomberg. Prev. @OhioStateCSE (@osunlp). Language Learner. From @Hangzhou.Zeev @ze3ev
81 Followers 308 FollowingWanqiao Xu @wanqiao_xu
143 Followers 371 Following PhD student @stanford RL Group 🌲| formerly @UMich Math 〽️ | interested in RL and Finetuning LLM | Previously @MetaAIS A @SunilAc17453471
44 Followers 136 FollowingSTL @CarlRee65462108
13 Followers 130 FollowingHarsh Desai @dreamerharsh
1 Followers 3K FollowingOlioli @Oliolilyx
126 Followers 2K FollowingArbaaz Qureshi @arbaaz__qureshi
328 Followers 2K Following Data Scientist @Lowes | Previously @Google and @MSFTResearch| CS grad @UMassAmherst and undergrad @IITPatShital Shah @sytelus
10K Followers 8K Following Deep learning research and code. If universe is an optimizer, what is the loss function? All opinions are my own.Alessandro Favero @alesfav
287 Followers 566 Following Physics/ML PhD candidate @EPFL working on the foundations of deep learning. Former applied scientist intern @AWSCloud AI Labs.CLS @ChengleiSi
2K Followers 3K Following vibing @stanfordnlp | real AGI is the friends we made along the wayBlaze (Balázs Galamb.. @gblazex
1K Followers 967 Following A Smooth Guy; Developer of SmoothScroll for macOS, Windows & Google Chrome.Dyah Adila🦄 @dyhadila
976 Followers 880 Following PhD-ing @WisconsinCS w/ @fredsala working on reliable ML | prev. AWS AI Labs @awscloud | pretty unserious but i share cool ML stuffs sometimesHuy Nguyen @huynm99
253 Followers 318 Following Ph.D Student in Statistics and Data Sciences at UT Austin, working on #MachineLearning #Mixture-of-Experts, and #OptimalTransportMohamed Sobhy @mohamedsobhi777
48 Followers 478 Following pursuing creativity via technology exploring 3D & virtual production addicted to GenAI github: https://t.co/JONq3niFMC creator of https://t.co/ctbiHM4xyBPierfrancesco Beneven.. @PierBeneventano
211 Followers 479 Following PhD student @Princeton | Exploring how to train AIs and their interaction with the world, while brewing my espresso.Mert Ali Gölbaşı @MertGolbasii
627 Followers 209 Following Be water my friend ✈︎ || full time cpt. pilotSahil Verma @Sahil1V
489 Followers 1K Following PhD student @uwcse. Robustness and Interpretability in ML. Former intern at @amazon, @itsArthurAI, @ETH_en, @MIT, @NUSingapore. Undergrad @IITKanpurTzu-Heng Huang @zihengh1
152 Followers 662 Following CS Ph.D. Student @WisconsinCS @UWMadison. Focusing on foundation models and data-centric AI.Peya Mowar @peyajm29
207 Followers 501 Following Grad student @CMU_Robotics | Prev @MSFTResearch | #AI, #HCI, #a11yShivam Agarwal @Shivamag12
79 Followers 340 Following Researcher@UIUC | @SiebelScholars class of 2024 | CS Grad Student #ML #NLP #DataMiningRiyasat Ohib @OhibRiyasat
108 Followers 621 Following Ph.D. Student at @GeorgiaTech. Prev: RS intern FAIR - @MetaAI. Making neural networks do more with less. #sparseNN #EfficientAIbagofwords.ai @bagofwordsai
285 Followers 4K Following All About NLP and Its Applications #safenlp #NLProc #ai #mlTessa @tessa1157
0 Followers 119 FollowingHumza Naveed @humza909
48 Followers 488 FollowingBiswaksen Patnaik @BiswaksenPat
171 Followers 600 Following PhD student in CS (@umdcs) at UMD. Member of HCIL(@hcil_umd). | HCI, Visualization, Physical Computing | amateur cyclist🚴, drummer 🥁, and maker⚒Aondofa @alfa_aondofa
109 Followers 1K Following Data science, Machine learning, Research student(Applied Mathematics) @NDefenceAcademyPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistYann LeCun @ylecun
709K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Clément Canonne @ccanonne_
31K Followers 926 Following Senior Lecturer @Sydney_Uni. Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @[email protected]Ananya Kumar @ananyaku
4K Followers 469 Following Researcher at @openai Previously PhD at Stanford University (@StanfordAILab) advised by Percy Liang and Tengyu MaBehnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingJelani Nelson @minilek
22K Followers 184 Following Professor @Berkeley_EECS. Research Scientist (part-time) @GoogleAI. Founder @addiscoder. 🇻🇮🇺🇸🇪🇹rishi @RishiBommasani
4K Followers 2K Following Stanford CS PhD @StanfordCRFM @StanfordNLP @StanfordAILab @StanfordHAI Advisers: @percyliang @jurafsky Previous: @CornellCIS @clairecardie #FoundationModelsGabriel Peyré @gabrielpeyre
92K Followers 450 Following @CNRS researcher at @ENS_ULM. One tweet a day on computational mathematics.Maxim Raginsky @mraginsky
8K Followers 2K Following father, academic, raconteur, aging wannabe hipster blog: https://t.co/akk6LCvKw6Ben Recht @beenwrekt
26K Followers 363 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.Yann Dubois @yanndubs
4K Followers 1K Following PhD student @stanfordAILab | Prev: AI resident @metaai, @vectorinst, @CambridgeMLGJason Lee @jasondeanlee
10K Followers 3K Following Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learningTengyu Ma @tengyuma
25K Followers 512 Following Assistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ; Working on ML, DL, RL, LLMs, and their theory.Aldo Pacchiano @aldopacchiano
1K Followers 414 Following AI research at Broad Institute and Boston University 🇲🇽Hossein Mobahi @TheGradient
6K Followers 691 Following Senior Research Scientist @GoogleAI. I ∈ Optimization ∩ Machine Learning. Fan of @IronMaiden🤘.Here to discuss research 🤓Andrej Risteski @risteski_a
3K Followers 2K Following Machine learning researcher. Assistant Professor, ML department at CMU (@mldcmu).tangbinh @tangbinh2
38 Followers 118 Followinglovish @louvishh
313 Followers 603 Following phding @ucl and @aiatmeta (llama team). mostly random tweets here.derek guy @dieworkwear
814K Followers 963 Following Menswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Washington Post, The Financial Times, Esquire, and Mr. PorterEmbassyIndiaDC Ppt @IndiaPassportDC
2K Followers 24 Following Official Twitter handle of Embassy of India Washington DC for Passport issuesDavid Hall @dlwh
2K Followers 1K Following Research Engineering Lead at @StanfordCRFM . Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter, Breeze. he/him @[email protected]Sanjana Srivastava @sanjana__z
261 Followers 450 Following AI PhD student @StanfordAILab @StanfordSVL | Previously @DeepMind @MITCSAIL @mitbrainandcogArchit Sharma @archit_sharma97
4K Followers 340 Following Final-year CS PhD student @Stanford. Previously, AI Resident @Google Brain, undergraduate @IITKanpur, research intern @MILAMontreal.Viswanathan Anand @vishy64theking
672K Followers 110 Following ♟️Known for Pawns; Unknown for Puns! ✉️[email protected]Mike Lewis @ml_perception
6K Followers 227 Following Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.VCs Congratulating Th.. @VCBrags
242K Followers 4K Following They're adding value™ And they're very proud of it. @BragsVenturesDieuwke Hupkes @_dieuwke_
2K Followers 238 FollowingDevi Parikh @deviparikh
23K Followers 152 Following Former Sr. Director, GenAI @Meta. Prof @GeorgiaTech. Generative artist https://t.co/z4n9IRQ3s5. Co-founded Caliper. @CarnegieMellon @RowanUniversity alum.C-Squared Podcast @CSQpod
9K Followers 24 Following The chess podcast. Hosted by @CristianChirila & @FabianoCaruana | https://t.co/OWumBxydBe & https://t.co/sVfjxlK5Fv | Are You TEAM CARUANA? 👇Sharan Narang @sharan0909
2K Followers 254 Following LLMs and AI Research (Llama 2 & 3 lead) @Meta | ex @Google (PaLM lead, T5), ex @Baidu (Deep Speech 2, Sparse Neural Networks), ex @NvidiaNelson Liu @nelsonfliu
4K Followers 841 Following @stanfordnlp PhD student. tweets auto-deleted periodically.Tianyi Zhang @Tianyi_Zh
1K Followers 611 Following iterating ... I used to train more language models but am working on agents nowPaul Graham @paulg
1.9M Followers 772 FollowingFabiano Caruana @FabianoCaruana
204K Followers 171 Following Chess Grandmaster, World Chess Championship challenger, 3rd highest ranked in history, 3x U.S. Chess Champion. Hosting @csqpod https://t.co/pnXnc0cHaxCristian Chirila @CristianChirila
4K Followers 165 Following ♟Chess Grandmaster | Coach @MizzouChess | 📺 Co-Host @CSQPod | 📧 business : [email protected] |Rose @rose_e_wang
2K Followers 238 Following NLP & Education @stanfordnlp 🌲 Prev: 2020 MIT 🦫, Google Brain 🧠, Google Brain Robotics 🤖Gary Cheng @garydient
317 Followers 580 Following @Stanford PhD candidate. prev: @GoogleAI @MPI_IS research intern. @Berkeley_EECS alum. Machine learning theory. Fast-food enthusiast.Roshni Sahoo @roshni714
84 Followers 112 Following cs phd @Stanford | causal inference + ml | mit '20 | she/herAndrew Ilyas @andrew_ilyas
2K Followers 166 Following Machine Learning PhD student at MIT, advised by Aleksander Madry and Costis Daskalakis.Karan Goel @krandiash
3K Followers 882 Following Founder @cartesia_ai, Machine Learning PhD at @StanfordAILab, CMU / IIT-Delhi alum.Dalalyan Arnak @ArnakDalalyan
608 Followers 111 FollowingSimon Shaolei Du @SimonShaoleiDu
6K Followers 2K Following Assistant Professor @uwcse. Postdoc @the_IAS. PhD in machine learning @mldcmu.The Ocean Cleanup @TheOceanCleanup
351K Followers 124 Following The Largest Cleanup in History. Follow the latest updates on our mission to rid the world's oceans of plastic. #TheOceanCleanupPorcupine Tree @PorcupineTree
96K Followers 25 Following The official X/Twitter for Porcupine Tree. New live album Closure/Continuation.Live out 8th December!Rohith Kuditipudi @rckpudi
254 Followers 116 Following PhD student @StanfordAILab advised by John Duchi and @percyliangno context memes @weirddalle
1.7M Followers 461 Following memes and weird ai generations | dm for promo | @hardaipics | follow my IGProject Football @ProjectFootball
278K Followers 80 Following The home of football on TikTok: https://t.co/2PCoCCh0gQ 📩 Get in touch: [email protected]Kaylee Burns @kaylburns
821 Followers 303 Following PhD Student working on 🤖🧠 @StanfordAILab she/herwta @WTA
1.0M Followers 641 Following Home of the Hologic WTA Tour 🎾 NOW PLAYING 📍 Madrid 🇪🇸 | April 23 - May 5 | Social media policy: https://t.co/4cuHKuQMq9poorly drawn lines @PDLComics
319K Followers 320 Following Comics by Reza Farazmand | Watch "Poorly Drawn Lines" on Hulu | Prints, books, and merch:Chara Podimata @charapod
748 Followers 367 Following Assistant Professor of OR/Stat at MIT. Past: postdoc UC Berkley, PhD Harvard. Interested in incentive-aware ML, bandits, mechanism design and public policy.Joon Sung Park @joon_s_pk
5K Followers 1K Following CS Ph.D. student @StanfordHCI + @StanfordNLP. Previously @MSFTResearch, @IllinoisCS & @Swarthmore. Oil painter. HCI, NLP, generative agents, human-centered AITennis TV @TennisTV
648K Followers 846 Following The best seat in tennis 🎾 Official streams: https://t.co/qhEoY7DtTv📱Get help: https://t.co/G6sWebIvYv 🤖 Buy merch: https://t.co/MvWFr5CDZF 👕Xuechen Li @lxuechen
2K Followers 900 Following Building intelligence @xai. PhD @Stanford. Undergrad @UofT. Worked at @GoogleAI @MSFTResearch @Vectorinst. I go by Chen.Kristy Choi @kristychoi_
510 Followers 628 Following CS @Stanford. Previously CS-Stats @Columbia. Machine Learning, Bayesian statistics, generative models.Mina Lee @MinaLee__
3K Followers 452 Following Postdoc at @MSFTResearch | Assistant Professor at @UChicagoCS (2024) | PhD at @Stanford | Language models, AI-assisted writing, Human-AI interaction ✍️Judy Shen @judyhshen
3K Followers 1K Following Ph.D. Student @Stanford CS working on algorithmic fairness, privacy, and explainability. Formerly @mit @medialab, @UofTDraymond Green @Money23Green
2.1M Followers 892 Following Forward for the Golden State Warriors by way of Michigan State and Saginaw Michigan... Owner of Performance Inspired NutritionIn addition to data and model optimization, stability, efficiency and fault tolerance of the entire infra stack is extremely crucial when we scale LLM training to tens of thousands of GPUs. Really glad to be working with @reducescatter and the brilliant AI Infra team @AIatMeta,…
@karpathy's insights on the complexities of training LLMs really hit home. Keeping the #llama3 training alive was a journey filled with hard challenges across the entire tech stack. Reading it kept me sane, knowing others have a similar experience.
Nice read on the rarely-discussed-in-the-open difficulties of training LLMs. Mature companies have dedicated teams maintaining the clusters. At scale, clusters leave the realm of engineering and become a lot more biological, hence e.g. teams dedicated to "hardware health". It…
Having a massive fleet of GPUs is only a part of the story, you absolutely need to have an amazing team who can make it work.
@deliprao I don’t think so. If we had used the same compute with chinchilla optimal params/tokens, it would result in a better model, but a very large model. To keep inference costs lower, it is better to train small models on more tokens. Interestingly we see they still keep improving 📈.
LLMs keep improving with more data! This also means we need better benchmarks
I'm seeing a lot of questions about the limit of how good you can make a small LLM. tldr; benchmarks saturate, models don't. LLMs will improve logarithmically forever with enough good data.
@deliprao If you are using different tokenizers(for mistral and llama models) then train ppl might not be comparable. Might also be good to validate on some eval tasks.
Llama-3 is looking very good on LMsys! Huge shoutout to the Llama 3 post training team! Amazing job 🦙🦙🦙
Wow, nearly 3K votes overnight -- A huge shoutout to our amazing community! Confidence intervals are narrowing, and Llama-3 remains strong! Big congrats to @AIatMeta for this incredible launch & contribution to open community. Full result coming out soon.
I'm seeing a lot of questions about the limit of how good you can make a small LLM. tldr; benchmarks saturate, models don't. LLMs will improve logarithmically forever with enough good data.
Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…
fixed the fixed fix for llama3
Congrats to @AIatMeta on Llama 3 release!! 🎉 ai.meta.com/blog/meta-llam… Notes: Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ @lmsysorg :)) 400B is still training, but already encroaching…
thinking about how I blasted “we don’t need no education” in grade 8 pink floyd phase and now I will do postdoc
AI: Actually, Indians
David (@dlwh) is an incredible friend and mentor. Highly recommend following his work — he not only dives deep into understanding *all* the parts of the systems he works with, but also cares about sharing these insights in a way that’s accessible. Levanter is just one example!
Recommend following David Hall (@dlwh) and the Levanter project from @StanfordCRFM . Just no nonsense details about fixing the pain-points of scaling LLM training, one at a time.
AlpacaEval is now length-controlled (LC)! ✅ highest correlation with Chat Arena (0.98) ✅ no reannotation ✅ simple interpretation: win rate if model length = baseline length ✅ robust to length gamification 0.98 that’s essentially evaluation on Arena but in 3min and <$10.
So happy to win the 🏆 in Indian Wells, but above all with my level this week against the best in the world! It means so much to me! 🤩 Your support has been vital, once again! VAMOS! 🧠❤️🥚🥚 📸 Getty
Lots of talk about chess boxing lately. The boys are staying ready 🥊
“One needs to learn to love and enjoy the little things in life. One also needs to discover one’s true calling and then should do everything to pursue the selected path,” - wise words @archit_sharma97 tribuneindia.com/news/amritsar/…
# on technical accessibility One interesting observation I think back to often: - when I first published the micrograd repo, it got some traction on GitHub but then somewhat stagnated and it didn't seem that people cared much. - then I made the video building it from scratch,…
I'm very excited about this release: Gemini 1.5 Pro - A highly capable multimodal model with a 10M(!!!) token context length!
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long…