Sachin Goyal @goyalsachin007
PhD student @ CMU MLD || Microsoft Research || UG @ IIT Bombay saching007.github.io Bengaluru, India Joined December 2015-
Tweets323
-
Followers765
-
Following714
-
Likes1K
🐾 We pretrained & finetuned Transformers w dummy <pause> 🐾 tokens to see what happens. Come to #ICLR posters to chat w @goyalsachin007 to learn more! arxiv.org/abs/2310.02226 (✨ new ✨): We also theoretically analyze when/why this can help enhance the *expressive* power! 1/
Check out Stylus, our #AI tool that mines online databases, including @HelloCivitai and @huggingface, and automatically finds and adds the best LoRAs and Textual Inversions for your prompt. Arxiv: arxiv.org/abs/2404.18928 Website: stylus-diffusion.github.io
Check out Stylus, our #AI tool that mines online databases, including @HelloCivitai and @huggingface, and automatically finds and adds the best LoRAs and Textual Inversions for your prompt. Arxiv: arxiv.org/abs/2404.18928 Website: stylus-diffusion.github.io
One week away from @iclr_conf in Vienna 🤩 I will be presenting two spotlights: why big foundation models generalize so well under the self-supervised setting, and how to leverage massive unlabeled data using a base kernel that encodes inter-sample similarity. Details 👇 (1/3)
1/What does it mean for an LLM to “memorize” a doc? Exactly regurgitating a NYT article? Of course. Just training on NYT?Harder to say We take big strides in this discourse w/*Adversarial Compression* w/@A_v_i__S @zhilifeng @zacharylipton @zicokolter 🌐:locuslab.github.io/acr-memorizati…🧵
Gukesh is the Champion of Candidates 🔥❤️🇮🇳 Edit: @ram_abhyudaya #chess #chessbaseindia #gukesh #candidates2024
Cool results!
To get the reference model, you finetuned on a math dataset. I see you get faster learning on math tasks with curation, but will the final accuracy be higher than simply finetuning on math dataset only for more epochs or for that matter reference model math accuracy. @zebgou
To get the reference model, you finetuned on a math dataset. I see you get faster learning on math tasks with curation, but will the final accuracy be higher than simply finetuning on math dataset only for more epochs or for that matter reference model math accuracy. @zebgou
1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsachin007 @zacharylipton @AdtRaghunathan @zicokolter 📝:arxiv.org/abs/2404.07177
I like this takeaway! Especially when the whole internet is your train and test set.
Been preaching (nor writing) this for a while: ANY data filter is a decision to not learn about sth. Think carefully: really want that? For small runs? For large runs? What else could you be inadvertently removing? Do you really know your filter? Looked at examples it removes?
Been preaching (nor writing) this for a while: ANY data filter is a decision to not learn about sth. Think carefully: really want that? For small runs? For large runs? What else could you be inadvertently removing? Do you really know your filter? Looked at examples it removes?
Super cool insights for curating data under constraints! Strong connections also to arxiv.org/abs/2305.16264
Super cool insights for curating data under constraints! Strong connections also to arxiv.org/abs/2305.16264
How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? @pratyushmaini and @goyalsachin007 provide scaling laws for such settings. Really excited about the work!
How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? @pratyushmaini and @goyalsachin007 provide scaling laws for such settings. Really excited about the work!
This is a really important point that I think a lot of the "high-quality" data papers/posts miss: When determining whether to include a new datapoint in a dataset, it's quality is NOT measured in isolation, but w.r.t the existing dataset.
This is a really important point that I think a lot of the "high-quality" data papers/posts miss: When determining whether to include a new datapoint in a dataset, it's quality is NOT measured in isolation, but w.r.t the existing dataset.
Danish Pruthi @danish037
7K Followers 628 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Shaily @shaily99
5K Followers 2K Following PhD @LTIatCMU Prev: @GoogleAI @MSFTResearch. Working on ethics and evaluation in #NLProc. Usually ranting, often about research & DEI. 📚 @readsndrantsPratyush Maini @pratyushmaini
1K Followers 340 Following Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhiJeremy Cohen @deepcohen
4K Followers 869 Following PhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.Christina Baek @_christinabaek
782 Followers 230 Following PhD student @mldcmu | Past: intern @GoogleAIAnanya Kumar @ananyaku
4K Followers 472 Following Researcher at @openai Previously PhD at Stanford University (@StanfordAILab) advised by Percy Liang and Tengyu MaYiding Jiang @yidingjiang
1K Followers 469 Following PhD student @mldcmu @SCSatCMU. Formerly intern @MetaAI, AI resident @GoogleAI. BS from @Berkeley_EECS. Trying to understand stuff.Divy Thakkar @divy93t
5K Followers 2K Following Strategy, Programs & Product @GoogleAI , HCI Researcher. Ph.D @CityUniLondon Alumni @iift1963 @daiictofficial. Personal views.Zico Kolter @zicokolter
15K Followers 499 Following Associate professor at Carnegie Mellon, VP and Chief Scientist at Bosch Center for AI. Researching (deep) machine learning, robustness, implicit layers.Aditya Kusupati @adityakusupati
3K Followers 2K Following 🔬PhD.. @uwcse: @RAIVNLab; Been places..... Done things....Andreas Kirsch 🇮�.. @BlackHC
9K Followers 5K Following Past: 🧑🎓 DPhil @AIMS_oxford @ExeterCollegeOx @UniofOxford (4.5yr) 🧙♂️ RE @DeepMind (1yr) 📺 SWE @Google (3yrs) 🎓 @TU_Muenchen 👤 Fellow @nwspkJason Lee @jasondeanlee
10K Followers 3K Following Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learningGargi Balasubramaniam @gargi_balasu
2K Followers 1K Following Research Engineer @GoogleDeepMind, @SiebelScholars '23, MS CS UIUC @IllinoisCS, Gold Medalist CS'20 BITS Pilani Goa, Prev @Meta, @AmazonScience, @Microsoft, 🎶Harshita Diddee @ihsrahedid
642 Followers 698 Following LTI PhD @SCSatCMU | Prev: RF at @MSFTResearch | Interested in Data Quality EstimationDylan Sam @dylanjsam
427 Followers 353 Following phd student @mldcmu | past: intern @AmazonScience, BS @BrownCSDeptSatwik Bhattamishra @satwik1729
406 Followers 643 Following CS PhD student at Oxford | Ex - Research fellow at Microsoft Research India, Undergrad at BITS PilaniValerie Chen @valeriechen_
982 Followers 418 Following phd student @mldcmu @SCSatCMU | previously @MSFTResearch @yale @CMU_Robotics @IBMResearchNathan Benaich @nathanbenaich
51K Followers 32K Following solo member of investment staff @airstreet, brewing ambition @airstreetcafe, next token predictor @airstreetpresssumeet singh @sumeetsingh90
41 Followers 239 Following All my posts merely reflect my personal opinion.Aaditya ; @Aaditya26082004
535 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈Arif Ahmad @arif_ahmad_py
285 Followers 7K Following All things AI, Computer Science and Circuits! Prev. @GoogleAIRishabh Agarwal @agarwl_
6K Followers 549 Following Senior Research Scientist, @GoogleDeepMind, ex-🧠. Agents that make decisions. NeurIPS Best Paper (RLiable). Mila, IIT Bombay.Celent @DorioNomo
3 Followers 9 Followingjordiae @jordiae
988 Followers 2K Following Transformers, NLP, ML4Code, HPC. PhD student @EdinburghUni. Previously @Bloomberg @MILAMontreal @BSC_CNS @la_upc. Opinions are my own.Dev Patel @nothingbutnett6
102 Followers 496 Following momma raised a soldier, not a bitch, not a bitchYu Bai @yubai01
3K Followers 2K Following Sr Research Scientist @SFResearch. PhD @Stanford. Researcher on foundation models, RL/games, deep learning, uncertainty quantification, and their theory.Akshay Goindani @AkshayGoindani1
19 Followers 213 Followingpolymorphism @polymorphism16
8 Followers 183 FollowingVamsi Bedapudi @wamsib
68 Followers 134 Following Serial Procrastinator. On my way to be an unsuccessful author.Dan Velasco @danjohnvelasco
96 Followers 2K Following Your lowly computer guy. Playing with computers at DLSU.Mayank @techsalts
72 Followers 1K Following “Everybody wants to save the world. Nobody wants to help mum in the kitchen”Joel Ye @_JoelYe
345 Followers 540 Following NeuroAI PhD student @CarnegieMellon. Being a brain-computer interface.Wen-Ding Li @xu3kev
2K Followers 5K Following Program Synthesis & ML. Previously Student Researcher at @google. Previously intern at @theteamatx. Mastodon: [email protected]Matthew Leavitt @leavittron
2K Followers 778 Following Chief Science Officer, Co-Founder @datologyai. Former: Head of Data Research @MosaicML; FAIR. 🧠 and 🤖 intelligence // views are from nowherePhilippe 🇪🇺 @FdsPhilippe
194 Followers 2K Following You miss 100% of the shots you don’t take. Software engineerAnkush Jain @schwifty50
337 Followers 530 Following Storage Systems graduate student @ Parallel Data Lab. Tweets not (very) academic.James Bradbury @jekbradbury
11K Followers 8K Following Compute at @AnthropicAI! Previously JAX, TPUs, and LLMs at Google, MetaMind/@SFResearch, @Stanford Linguistics, @Caixin.Соломатин Р.. @r_samoed
13 Followers 199 FollowingMichael Fine @fine_whines
350 Followers 2K Following ML Privacy Researcher @Apple | previously @Harvard @TwoSigma @UberATG @HFAJim Winkens @jimwinkens
538 Followers 524 Following Building new ML products @Google Labs. Prev @GoogleDeepMind.Rulin Shao @RulinShao
615 Followers 396 Following PhD @UWNLP | MS @SCSatCMU | ex-Applied Scientist @AWSJames Parsloe @jamesparsloe
195 Followers 5K Following ML Engineer. Trying to increase the FLOPs I have access to. Used to make computers talk at Spotify/Sonantic.christian cch @chris_cch_
185 Followers 3K FollowingMayank Jain @DeshlahraMayank
196 Followers 4K Following@ @zYu0WLJWJK3lxFx
0 Followers 2K FollowingJim Bohnslav @jbohnslav
1K Followers 4K Following ML engineer @ Zoox. Previously Cobot, neuroscience PhD -- making intelligent machinesKevin @kevin__dave
163 Followers 237 Following In an abusive relationship with Mathematics. Strong opinions, loosely held. Prev: @ISIKolkata, @BristolUniDenis Bykov @denis_bykov
145 Followers 608 Following Search @ Yango Maps. Leading talented minds in crafting innovative search engines.Eva Louise Marie Gabr.. @e681554349
9 Followers 3K FollowingDori @drvbvr
21 Followers 582 FollowingSai Vignan @vignan_sai
92 Followers 2K Following ML Engineering @sprinklr, CS @iitdelhi, Interested in ML, Bio InformaticsAK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxAndrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Yann LeCun @ylecun
712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistDanish Pruthi @danish037
7K Followers 628 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Shaily @shaily99
5K Followers 2K Following PhD @LTIatCMU Prev: @GoogleAI @MSFTResearch. Working on ethics and evaluation in #NLProc. Usually ranting, often about research & DEI. 📚 @readsndrantsZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Gautam Kamath @thegautamkamath
44K Followers 507 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.Pratyush Maini @pratyushmaini
1K Followers 340 Following Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhiBehnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingJeremy Cohen @deepcohen
4K Followers 869 Following PhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.Jia-Bin Huang @jbhuang0604
51K Followers 285 Following Associate Professor @umdcs; Part-time Research Scientist @Meta. I like pixels.Peyman Milanfar @docmilanfar
67K Followers 264 Following Distinguished Scientist at Google Research. Computational Imaging, Machine Learning, and Vision. Tweets = personal opinions. May change or disappear over time.Clément Canonne @ccanonne_
31K Followers 928 Following Senior Lecturer @Sydney_Uni. Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @[email protected]Christina Baek @_christinabaek
782 Followers 230 Following PhD student @mldcmu | Past: intern @GoogleAIAnanya Kumar @ananyaku
4K Followers 472 Following Researcher at @openai Previously PhD at Stanford University (@StanfordAILab) advised by Percy Liang and Tengyu MaJeff Dean (@🏡) @JeffDean
297K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)Google DeepMind @GoogleDeepMind
944K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.Vladimir Kramnik @VBkramnik
11K Followers 28 Following Vladimir Kramnik, Chess Grandmaster & World Champion ('00-'07). Advocate for fair play in chess.Jason Levin @iamjasonlevin
24K Followers 998 Following I help startups/VCs go viral. Read about my adventures and strategies here: https://t.co/UWaWrsgQMk. Growing @producthunt, @jamdotdev, and more.Anna Bair @annaebair
121 Followers 374 Following CMU PhD student in machine learning https://t.co/3vzZEGbXc4Alex Robey @AlexRobey23
621 Followers 849 Following Ph.D. student at @Penn studying robust machine learning. Formerly @GoogleAI, @Livermore_Lab | B.S. & B.A. from @swarthmoreRVCJ Media @RVCJ_FB
909K Followers 2K Following India's Largest Digital Publisher. Official Handle of RVCJ Digital Media Private Ltd. Business Enquiry:- [email protected]Ari Morcos @arimorcos
6K Followers 2K Following CEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind, PhD @PiN_Harvard.Akanksha Sachan @sachan1akanksha
147 Followers 466 Following PhD student @CMUCompBio | 3D epigenomics enthu | ChemEng @iitbombayJovina Vaswani @VaswaniJovina
13 Followers 111 FollowingAnshul Nasery @anshulnasery
316 Followers 1K Following PhD student at @uwcse | Previously Pre-Doctoral Researcher at @GoogleAI, undergrad at @iitbombayMaksym Andriushchenko.. @maksym_andr
3K Followers 932 Following phd student at @EPFL🇨🇭 // google & open phil phd ai fellow // past @adoberesearch @uni_tue // best way to support 🇺🇦 https://t.co/fxomgJ7NU9Ashish Mittal @AshishM451
11 Followers 67 FollowingPrakhar @kyuuurius
22 Followers 153 Following AI Engineer at https://t.co/VhhjBXGYIg | FPV Drone Pilot | IIT-B 2020 | All things AI and techSimi Karan, IAS @_SimiKaran_
12K Followers 86 Following IAS '20 | Assam-Meghalaya cadre | EE, IITB '19 | Book nerd, Odissi dancer, Sports buffDevendra Chaplot @dchaplot
8K Followers 365 Following Building next-gen AI at @MistralAI. Past: Research Scientist at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.Avi Schwarzschild @A_v_i__S
270 Followers 183 Following Postdoc at CMU. Trying to learn about deep learning faster than deep learning can learn about me.Sasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzYuchen Li @_Yuchen_Li_
592 Followers 518 Following PhD Student in Machine Learning at Carnegie Mellon University @mldcmu @SCSatCMU. Intern @GoogleAI. Previously @Quora, @Illinois_Alma. Math of deep learning, LMsSergey Levine @svlevine
80K Followers 122 Following Associate Professor at UC Berkeley Co-founder, Physical IntelligenceAnsh Khurana @AnshKhurana11
2K Followers 656 Following ML @Apple, MS CS @Stanford. Previously, Research @GoogleAI; CS @iitbombay. Views are personal.Santiago Cortés @sancortes_95
284 Followers 956 Following Mathematician/engineer/AI researcher. @mldcmu/@SCSatCMU CS PhD student. Proud Colombian 🇨🇴. he/himRomain Beaumont @rom1504
2K Followers 389 Following Doing multimodal deep learning + scale. I like packaging, transformers, knn and compute estimations.Yongchao Zhou @Yongchao_Zhou_
539 Followers 301 Following Build Intelligence @xai | ML PhD @UofT @VectorInst | Prev. @GoogleAI @GoogleDeepMind | Working on LLMsHelen Zhou @helenz1235
449 Followers 465 Following Machine learning for the ever-changing healthcare landscape. ML PhD Student at Carnegie Mellon University.Victor Akinwande @aknvictor
1K Followers 4K Following I like thinking about Technology's role in society. I work towards Artificial Intelligence that is reliable. Ph.D. student @SCSatCMU, Prev: @IBMResearch.Ashwini Pokle @ashwini1024
275 Followers 436 Following PhD student at CMU (@mldcmu). Prev. @Stanford, @bitspilaniindia | interested in generative models and deep equilibrium modelsYutong (Kelly) He @electronickale
316 Followers 299 Following PhD student @mldcmu, I’m so delusional that doing generative modeling is my jobshubhang @s_bhatnagar_tw
37 Followers 193 Following Computer Vision PhD Student @ECEILLINOIS, Undergrad @iitbombayFahim Tajwar @FahimTajwar10
169 Followers 243 Following PhD Student @mldcmu @SCSatCMU BS/MS from @StanfordAbhishek Gupta @abhishekunique7
5K Followers 639 Following Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at BerkeleyDavid Held @davheld
3K Followers 604 Following Associate Professor at Carnegie Mellon University | he/himAviral Kumar @aviral_kumar2
2K Followers 338 Following Research Scientist at Google DeepMind. Incoming Assistant Professor of CS & ML at CMU (Fall 2024). PhD from UC Berkeley.Vaishaal Shankar @Vaishaal
807 Followers 337 Following ML research @ apple. Trying to find artificial intelligence. Opinions are my own.Anish Madan @anishmadan23
345 Followers 2K Following MS @CMU_Robotics | Prev: Associate ML Scientist @WadhwaniAI | @IIITDelhi '20Deepak Pathak @pathak2206
16K Followers 316 Following I study topics in AI (machine learning, robotics & computer vision).Gaurav Ghosal @gaurav_ghosal
79 Followers 138 Following Incoming Ph.D. Student @mldcmu | Former Undergraduate Student @berkeley_eecs and Researcher @berkeley_ai |Zachary Novack @zacknovack
374 Followers 324 Following Controllable Audio/Music generation | PhD Student @ucsd_cse | Research Scientist Intern @AdobeResearch | previously @acmi_lab | teaching drums @pulsepercussionsamir gadre @sy_gadre
440 Followers 488 Following phd @columbia | formerly intern @allen_ai x2, ugrad @brownuniversity | pre-training | Black Lives Matter | he/himThao Nguyen @thao_nguyen26
573 Followers 181 Following PhD student @uwcse & visiting researcher @MetaAI. Formerly @GoogleAI Resident, @Stanford'19, @twosigma.Shreekant Dashora @ShreekantDasho2
1 Followers 2 FollowingRuntian Zhai @RuntianZhai
345 Followers 219 Following PhD @SCSatCMU. I study representation learning, why big models generalize, and out-of-distribution(OOD).Yewen Fan @YewenF
86 Followers 167 Following PhD @mldcmu, Previously Senior MLE @Meta, CS / Math @IllinoisCS🐾 We pretrained & finetuned Transformers w dummy <pause> 🐾 tokens to see what happens. Come to #ICLR posters to chat w @goyalsachin007 to learn more! arxiv.org/abs/2310.02226 (✨ new ✨): We also theoretically analyze when/why this can help enhance the *expressive* power! 1/
Go Medusa pretraining🤩
Wow, Medusa can be used for pre-training and leads to a better and faster generation! 😍
Talk: "OLMo: Findings of Training an Open LM" from Hanna Hajirshizi at AI2 from OSGAI. Extremely interesting overview of the 4 parts (Data, Training, Adaptation, Eval) of the OLMo open LLM project. Rare insight into how these processes work at scale. youtube.com/watch?v=qFZbu2…
As much as I love India & always have, I find upper-middle class Indians, especially elders, to be some of the most entitled & bigoted people that one will ever come across. Took forever and living in a gated community to realise this strongly. There, I said it.
📢 Our new 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗱𝗲𝗲𝗽 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗶𝗻 𝗠𝗥𝗜 is out! arxiv.org/abs/2404.15692 Co-authors @HeckelReinhard @Dr_ASChaudhari @PerlmanOr Mathews Jacob It covers supervised, self-supervised, generative AI, pulse-seq optim., uncertainty est., qMRI, pitfalls & more!
It's a big week for high quality training data
Microsoft announces Phi-3 A Highly Capable Language Model Locally on Your Phone We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing,
What an evening 🫣 Congratulations to Gukesh and special award for both Fabi, Ian for their increadible performance today. One of the most interesting games I ever saw. Bravo, REAL FIGHTERS, for giving it all. Most important, more than anything in chess in fact Full respect
Congratulations to @DGukesh for becoming the youngest challenger. The @WacaChess family is so proud of what you have done . I’m personally very proud of how you played and handled tough situations. Enjoy the moment
@ManishGuptaMG1 Was wondering if I was the only one on my twitter feed watching it! Vishy would be so proud.
LLM unlearning was mostly based on variants of gradient ascent (GA), susceptible to catastrophic forgetting. We propose Negative Preference Optimization (NPO), demonstrating efficient unlearning on TOFU benchmark. w/ @RuiqiZhang0614 @ Licong Lin, @yubai01. arxiv.org/abs/2404.05868
🚀 Introducing Pile-T5! 🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer. ✨ Featuring intermediate checkpoints and a significant boost in benchmark performance. Work done by @lintangsutawika, me…
Google presents Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies arxiv.org/abs/2404.08197
Best Practices and Lessons Learned on Synthetic Data for Language Models Great overview by Google DeepMind on synthetic data research. It covers applications, challenges, and future directions This is an important paper given the significant advancements we are seeing from the…
Been preaching (nor writing) this for a while: ANY data filter is a decision to not learn about sth. Think carefully: really want that? For small runs? For large runs? What else could you be inadvertently removing? Do you really know your filter? Looked at examples it removes?
9/9: KEY TAKEAWAY: If you overtrain, go back n check the curation strategy of these *static* sets, and reduce their aggressiveness. Otherwise, as we show, even the raw common crawl can outperform famous curated subsets like LAION. Why: because data loses utility with repetitions.
@goyalsachin007 Your paper is definitely interesting and there is more nuance there but my point is rather that "Otherwise, as we show, even the raw common crawl can outperform famous curated subsets like LAION." Is not right. It would be amazing if it were true, lot more data that way
Let me correct my statement I REALLY LIKE THIS TAKEAWAY! Great work @goyalsachin007 and @pratyushmaini! psst synthetic data & diversity no go well hand in hand. what is the marginal utility of a synthetic datapoint when it is sampled from same echo chamber of filtered data!
I like this takeaway! Especially when the whole internet is your train and test set.
9/9: KEY TAKEAWAY: If you overtrain, go back n check the curation strategy of these *static* sets, and reduce their aggressiveness. Otherwise, as we show, even the raw common crawl can outperform famous curated subsets like LAION. Why: because data loses utility with repetitions.
Glad to have made a small contribution to it!
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: github.com/openai/simple-…