Romain Beaumont @rom1504
Doing multimodal deep learning + scale. I like packaging, transformers, knn and compute estimations. rom1504.fr Paris, France Joined July 2008-
Tweets657
-
Followers2K
-
Following388
-
Likes7K
At 5000 tons, Starship is the largest flying object ever made. Thrust is more than double the Saturn V moon rocket. It is the first spaceship design capable of making life multiplanetary. Goal of the next mission is to make it through the meteorically extreme heat of reentry.
At 5000 tons, Starship is the largest flying object ever made. Thrust is more than double the Saturn V moon rocket. It is the first spaceship design capable of making life multiplanetary. Goal of the next mission is to make it through the meteorically extreme heat of reentry.
The future CAN be better than the present, but believing that it WILL, without effort, is naïve. True optimism is active, not passive.
Good morning. Gentle reminder that everything around you was made up by people that were no smarter than you
Before we have a million robots in the physical world, we will first see a billion embodied agents in virtual worlds. Gaming is the second major area I'm dedicated to in 2024. AI and Gaming are born for each other, and their happy marriage is just getting started. On one hand,…
The Techno-Optimist Manifesto -- please read and Ask Me Anything! Post questions as replies to this xeet. a16z.com/the-techno-opt…
Be an Open Source Absolutist! It is hard to overstate how much value Open Source Software has added to the world, and how broadly empowering it is. Operating systems, development tools, core libraries, and critical applications – a great many of the software tools used by the…
My brother is building an AI product for health. Looks great, check it out!
Humanity is going to make all parts of the world touched by humans beautiful. We are going to create beauty too cheap to meter. And not just an enforced-from-above standard of beauty either, everyone will be able to make their own domain beautiful in the manner of their choosing.
Why do people fear AI? I hear three reasons: 1. Cynicism — the belief that it is rational not to cooperate 2. Humanism/racism — systematic bias against machines, denial of their potential moral worth and personhood 3. Conservatism — fear of change, fear of the other tribe None…
there hasn’t been a time to be this optimistic about the future of the world since 1970. were in it for real. infinite abundance
🎥Introducing video2dataset - a simple tool for large scale video dataset curation. Easily turn a set of links into into a video (or audio) dataset. More details below 🧵 GitHub: github.com/iejMac/video2d… Blog: laion.ai/blog/video2dat…
Why AI Will Save The World -- my new megapost on why you should be excited, not scared, about AI. Enjoy! a16z.com/2023/06/06/ai-…
This makes me glad we've been maintaining mineflayer for the last 10 years. Minecraft can be a great environment for AI. I had tried back then to have commands like this with regular parsing and it worked a little github.com/rom1504/rbot . LLM bridge the gap nicely!
This makes me glad we've been maintaining mineflayer for the last 10 years. Minecraft can be a great environment for AI. I had tried back then to have commands like this with regular parsing and it worked a little github.com/rom1504/rbot . LLM bridge the gap nicely!
Introducing DataComp, a new benchmark for multimodal datasets! We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION 📜 arxiv.org/abs/2304.14108 🖥️ github.com/mlfoundations/… 🌐 datacomp.ai
With Starship's first orbital attempt slated for tomorrow, good time for a refresher on @SpaceX's big picture. waitbutwhy.com/2015/08/how-an…
I am happy to share that together with @kilian_maciej, @rom1504, @wightmanr and the rest of the github.com/mlfoundations/… community we added a new model architecture to the repository, Contrastive Captioners (CoCa), for all the info check laion.ai/blog/coca/. Briefly 👇
Today I am happy to announce that I have released an #OpenCLIP ViT-H-14 based index of #laion5b! This means that users can now search through billions of samples quickly and easily using the same clip model used to train #StableDiffusion2. demo at knn.laion.ai/?back=https%3A…
We've trained a new ViT-G/14 CLIP model with OpenCLIP on LAION-2B which achieves 80.1% zero-shot accuracy on ImageNet and 74.9% zero-shot image retrieval (R@5) on MSCOCO. As of Jan 2023 this is the best open source CLIP code: github.com/mlfoundations/… blog: laion.ai/blog/giant-ope…
AK @_akhaliq
309K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxAndrej Karpathy @karpathy
977K Followers 904 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Jim Fan @DrJimFan
229K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Rivers Have Wings @RiversHaveWings
31K Followers 225 Following AI/generative artist. Writes her own code. Absolute power is a door into dreaming.Suhail @Suhail
295K Followers 468 Following Founder: @playground_ai, @mixpanel Pizzatarian, programmer, music makerRoss Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.near @nearcyan
45K Followers 884 Following https://t.co/IdaJwZJCXm partner @ https://t.co/9g1MIgjiqc dms openclem 🤗 @ClementDelangue
90K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform to build machine learningSharif Shameem @sharifshameem
53K Followers 3K Following founder @LexicaArt • in pursuit of good explanationsJustin Pinkney @Buntworthy
10K Followers 1K Following Playing with deep learning, computer vision and generative art. Co-creator of https://t.co/sYtHB9e5Dj, ML Researcher @ MidJourney @[email protected]Jeremy Howard @jeremyphoward
221K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordTomLikesRobots🤖 @TomLikesRobots
33K Followers 5K Following AI Artist at Metaphysic working with AI and VFX. All views my own. Experienced Web Dev and Artist. Early explorer of Artificial Creativity.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pStella Biderman @BlancheMinerva
15K Followers 749 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herpharmapsychotic @pharmapsychotic
18K Followers 7K Following Ai generative artist. code @StabilityAI fan of tacos and cats. #aiart #generativeart Ai tools: https://t.co/uyr9NjvLBaapolinario (multimoda.. @multimodalart
10K Followers 377 Following ML for Art and Creativity, working @HuggingFace ([email protected])David Marx || digthat.. @DigThatData
4K Followers 2K Following Generative AI MLE, FOSS toolmaker, innovation catalyst @CoreWeave + @AiEleuther. AI enhanced creativity, philosophy of mind/science/probabilityIrina Rish @irinarish
9K Followers 995 Following prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head https://t.co/UzlrC7ZrGF; INCITE project PI https://t.co/0rV7szd7rH; CSO https://t.co/XDhj6MEtUjDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Alan Penichet @AlanPenichet
128 Followers 271 Following Texas ex | dm me if you have a Taco Bell discountArif Ahmad @ArifAhm92263086
210 Followers 6K Following All things AI, Computer Science and Circuits! Prev. @GoogleAITeele @Teele1866180
0 Followers 222 FollowingRdn_HrtSong @RHrtsong5189
2 Followers 663 Followingqnguyen3 @stablequan
3K Followers 1K Following Multimodal | Synthetic Data | Multimodal Lead at Ontocord AIAryan Pandey (Look fo.. @AryanPa66861306
1K Followers 2K Following Half Machine Learning Engineer || DevOps and Machine Learning || Open Source at OpenVINOzigmengineer @zigmengineer
0 Followers 11 FollowingSeshesh @Seshesh115921
4 Followers 605 FollowingIlia Azizi @ilia_azizi
22 Followers 321 Following PhD Candidate @unil - #MachineLearning - Opinions are my own.Matt McClean @matthewmcclean
618 Followers 4K Following Kiwi bloke. Into Cloud Computing, DevOps and Agile/Lean for a living and Philosophy for the soul. Solutions Architect at Amazon Web Servicesus_Hazel_ @HazelUs27977
3 Followers 461 Following Nice to meet you. My hobbies are reading, food and sports. I like cats😘 I like to meet new friends while traveling🎉🎉🎉Kai @kdmalc
162 Followers 701 Following EE PhD Candidate @Rice working on federated learning for personalized and privacy-preserving human-computer interfacesKen aka Frosty 🔜 D.. @KenAKAFrosty
2K Followers 1K Following 💻 Web developer, 🔬 applied deep learning & AI researcher. 🔨Building cool stuff for streamers.Lily_Eleanor @LilyEleano81021
8 Followers 1K FollowingTwltHrmy_67 @Twlthrmy690567
1 Followers 788 FollowingAaria Goehringer @aar_goehring
71 Followers 5K FollowingDouwe Egberts @DouweEgberts00
177 Followers 1K FollowingElon Musk @_Elon_Musk220
133 Followers 730 FollowingHabib Zhman/حبیب .. @HabibZhman357
164 Followers 1K Following Offical Twitter Account of Spokesman Of Islamic Emirate Of Afghanistan.Kartikeya @leeecheeeeee
33 Followers 391 Following Flaneur. Deep Learning | Cricket | Music | Philosophy.Michael Randall @MichaelRan35424
10 Followers 256 Following No Multimillionaire has ever made it through salary, However joining the Great Illuminati society and acquiring Wealth, Fame, Power, and Knowledge🔺AI Derviş'i @AiDervis
24 Followers 277 Following Açık kaynak ilmi inceler, halka veririm! #aidervisiCédric Limousin @Maeelk
273 Followers 1K FollowingEric @RealEricD
9K Followers 8K Following Questionable questioner. Thinking out loud on the good life, tech, and whatever captures my interest. Hacking on insurtech ideas. Views mine. Persuadable.Teteal @Teteal639095
196 Followers 2K Followingsamir gadre @sy_gadre
437 Followers 488 Following phd @columbia | formerly intern @allen_ai x2, ugrad @brownuniversity | pre-training | Black Lives Matter | he/himUnbox Research @unboxresearch
747 Followers 648 Following We write articles about AI! https://t.co/YfdaFwMP2d We’re a machine learning research & development company. We find answers in data. Founded by @tylerneylon.Laura Misrachi @laurajrmisrachi
0 Followers 48 FollowingMrinal Deo @mdeo_deo
93 Followers 2K Following Computer engineer in his 40s. Loves computer architecture and rendering.Gabriel Misrachi @gabmis_
5 Followers 130 FollowingAlessandro Abluton @ablueai
21 Followers 169 FollowingYves Junqueira @cetico
936 Followers 675 Following Software Engineer, starting something new. Prev: co-founder @yourbaseio, engineer @google 🇧🇷🇨🇭🇺🇸🇵🇹Janet Hammerman @HammermanJ7009
25 Followers 2K Following 💰Janet , 21 , Biggest crypto casino presale👇🔑Aus Alzubaidi @AlzubaidiAus
701 Followers 5K Following Father of two, off-road lover, cybersecurity and AI enthusiast佳禹 陈 @jiychn35257501
10 Followers 620 FollowingSheesee @Sheesee159258
230 Followers 6K FollowingAK @_akhaliq
309K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxAndrej Karpathy @karpathy
977K Followers 904 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Yann LeCun @ylecun
709K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Stability AI @StabilityAI
189K Followers 31 Following We are building the foundation to activate humanity's potential.François Chollet @fchollet
469K Followers 770 Following Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.Jim Fan @DrJimFan
229K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Rivers Have Wings @RiversHaveWings
31K Followers 225 Following AI/generative artist. Writes her own code. Absolute power is a door into dreaming.Lucas Beyer (bl16) @giffmana
56K Followers 445 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Suhail @Suhail
295K Followers 468 Following Founder: @playground_ai, @mixpanel Pizzatarian, programmer, music makerRoss Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.AI at Meta @AIatMeta
530K Followers 255 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.Google DeepMind @GoogleDeepMind
942K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.clem 🤗 @ClementDelangue
90K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform to build machine learningAI Pub @ai__pub
72K Followers 343 Following AI papers and AI research explained, for technical people. Get hired by the best AI companies: https://t.co/MySVjUGOQ3Vaibhav (VB) Srivasta.. @reach_vb
11K Followers 169 Following GPU poor @Huggingface | F1 fan | Here for @at_sofdog’s wisdom | *opinions my ownAgustin Mautone @agusit
171 Followers 401 FollowingLogan Kilpatrick @OfficialLoganK
92K Followers 2K Following Lead product for @Google AI Studio and working on the Gemini API, helping developers build with AI, my views!Sholto Douglas @_sholtodouglas
15K Followers 856 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meterHannu Rajaniemi @hannu
6K Followers 1K Following Co-founder and CEO of @HelixNano. Author of The Quantum Thief, The Fractal Prince, The Causal Angel, Invisible Planets and Summerland https://t.co/o6RsVIiYj3Devon Eriksen @Devon_Eriksen_
37K Followers 283 Following Author, Engineer, Sharpshooter, part-time Daemon Prince of Tzeentch. Not a cat. If you see hell everywhere you look, then perhaps hell is inside your eyes.Databricks Mosaic Res.. @DbrxMosaicAI
30K Followers 115 Following We remove the barriers to state-of-the-art generative AI model development and make data + AI available to all.samir gadre @sy_gadre
437 Followers 488 Following phd @columbia | formerly intern @allen_ai x2, ugrad @brownuniversity | pre-training | Black Lives Matter | he/himRobert Yang @GuangyuRobert
3K Followers 185 Following Co-founder, CEO at @Altera_AL, Computational Neuroscientist, former Assistant Professor @mitbrainandcog & @MITEECSCognition @cognition_labs
123K Followers 19 Following Makers of Devin, the first AI software engineer. We are an applied AI lab focused on reasoning, and code is just the beginning. Join us: https://t.co/tpfZwEwGiqBowen Cheng @bowenc0221
2K Followers 265 Following Research scientist @OpenAI | Ex-@Tesla | Ph.D. @ECEILLINOIS. | Ex-intern @MetaAI, @GoogleAI, @MSFTResearch.Christian Keil @pronounced_kyle
20K Followers 1K Following VP @Astranis, building internet satellites ◦ host of @1stPrinciplesFM ◦ investor and believer in deep tech startupsNikolay Savinov 🇺�.. @SavinovNikolay
1K Followers 0 Following Research Scientist at @GoogleDeepMind Work on LLM pre-training in Gemini ♊ 10M context length in Gemini 1.5 Pro 📈Dr. Phil Metzger @DrPhiltill
96K Followers 499 Following Director, Center for Microgravity Research & Education @UCF. Previous: co-founder of NASA KSC Swamp Works. Space Mining. Space Settlement.Yuke Zhu @yukez
15K Followers 464 Following Assistant Professor @UTCompSci | Co-Leading GEAR @NVIDIAAI | CS PhD @Stanford | Building generalist robot autonomy in the wild | Opinions are my ownJack Rae @drjwrae
9K Followers 353 Following Principal Scientist @ Google DeepMind Work on Gemini 💎♊ Compression is all you need LLMs (e.g. Gopher, Chinchilla, Gemini) 💼 Past: OpenAI, QuoraMax Bain @maxhbain
2K Followers 497 Following multimodal @RekaAILabs | prev: phd @Oxford_VGG hardwork-pilledOmer Bar Tal @omerbartal
2K Followers 108 Following Founding Scientist @pika_labs | ex @WeizmannScience @GoogleAIGabor Cselle @gabor
20K Followers 2K Following I try all the new AI products so you don't have to. Before: 3x startup founder (T2, Namo Media, reMail). Director at Google, PM at Twitter.Hao Liu @haoliuhl
4K Followers 155 Following machine learning, neural networks. phd student @berkeley_ai. https://t.co/ZNJawlrerSAndrew C - Rocket Fut.. @TheRocketFuture
3K Followers 683 Following Rocket Future - Documenting and sharing the most exciting spaceflight moments | Member of The Inspired 24 | Licensing Inquiries: [email protected]Starship Gazer @StarshipGazer
40K Followers 590 Following STARBASE, TX Pro Photography, Video Production and Live Streaming. Go SpaceX!! Download Pro Full Res photos and help support my work: https://t.co/4wJ77JvztqThomas H. Chapin IV @tomchapin
4K Followers 4K Following AI Engineer and advocate for human healthspan, longevity, and overall quality of life. Let’s use AI to slow aging and beat disease!Tomas Pueyo @tomaspueyo
332K Followers 548 Following Understand deeply how the world works today to navigate the world of tomorrow. Join 80k ppl in my free newsletter:George Hotz 🌑 @realGeorgeHotz
248K Followers 172 Following President @comma_ai. Founder @__tinygrad__Mikey @MikeyShulman
533 Followers 8 Following Aspiring mediocre athlete. Co-founder of https://t.co/DEMHM3kA1T.koray kavukcuoglu @koraykv
8K Followers 84 Following VP of Research and Technology at Google DeepMindOriol Vinyals @OriolVinyalsML
166K Followers 82 Following VP of Research & Deep Learning Lead, Google DeepMind. Gemini co-lead. Past: AlphaStar, AlphaFold, AlphaCode, WaveNet, seq2seq, distillation, TF.Nora Belrose @norabelrose
8K Followers 124 Following Working toward a free and fair future powered by friendly AI. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.Denny Zhou @denny_zhou
9K Followers 420 Following @GoogleDeepMind founder & lead of Reasoning Team. Build LLMs to reason. Opinions my own.Thibaud Zamora ∞ @thibaudz
10K Followers 1K Following Working on an unannounced project | A.I. Films https://t.co/UGF9SiIIqp 1M views | OpenSource Contributor | ex-CEO of 2 games studios - 100M of playersPika @pika_labs
116K Followers 53 Following Video on command. Website: https://t.co/G5bjmrMQsx Discord: https://t.co/bX68ThPTQH About: https://t.co/atvdcgbe9SDuckAI @TheDuckAI
622 Followers 8 Following An open-source ML research community at Discord: https://t.co/7YDTo6Mo1GZachary Nado @zacharynado
5K Followers 648 Following Research engineer @googlebrain. Past: software intern @SpaceX, ugrad researcher in @tserre lab @BrownUniversity. All opinions my own.Sam Zeloof @szeloof
27K Followers 106 FollowingSharif Shameem @sharifshameem
53K Followers 3K Following founder @LexicaArt • in pursuit of good explanationsGreg Yang @TheGregYang
53K Followers 660 Following Cofounder https://t.co/SpHbO7FZNV. Morgan Prize Honorable Mention 2018. Developing the theory of #TensorPrograms and the practice of scaling #neuralnetworks.Neil Houlsby @neilhoulsby
4K Followers 317 Following Professional AI researcher; amateur athlete. Senior Staff RS in the Google Deepmind, Zürich. Attempts triathlons.Saurabh Garg @saurabh_garg67
866 Followers 576 Following Robustifying LLMs and VLMs | PhD student @mldcmu | prev/ CS @iitbombay (undergrad); Collab @GoogleAI @awscloud @appleNiels Rogge @NielsRogge
10K Followers 690 Following ML Engineer @ML6team, part-time at @huggingface. @KU_Leuven grad. General interest in machine learning, deep learning. Making AI more accessible for everyone!David Beniaguev @DavidBeniaguev
2K Followers 1K Following For fun and work, I build Generative Models. Comp neuroscience PhD. Was trying to understand the brain to help build AI, but it appears it's no longer necessary@cHHillee But now the teams are better aligned, clarity is increased, and we are setup to deliver real value! For the third time in the year.
One thing I really enjoy about working on an OSS facing project like PyTorch is that OSS really cuts through a lot of "politics" and "fake work". Within a company, people are incentivized to do all sorts of things other than "build the right thing". Unfortunately, no amount…
True Story! One of the many reasons I love open source is it doesn't give a damn about the org chart or "managing up." If people outside of FB/Meta didn't use or like our OSS then something was wrong with it. PyTorch succeeded because of the hyper focus on developer…
1.5 Pro is a very, very good model 🚀🚀 but even more excited for what we have in store 🕺
More exciting news today -- Gemini 1.5 Pro result is out! Gemini 1.5 Pro API-0409-preview now achieves #2 on the leaderboard, surpassing #3 GPT4-0125-preview to almost top-1! Gemini shows even stronger performance on longer prompts, in which it ranks joint #1 with the latest…
llms this, llms that! why aren't people releasing more audio stuff 😭 i want tts, asr, speech translation, voice cloning, text to audio, text to music, anything..
@Thom_Wolf The 3 key elements of a good dataset: 1. quality 2. diversity 3. quantity You can only easily measure the last one but the performance is a sensitive function of all three. Super interesting topic ty for #longread :)!
Gemini 1.5 Pro is now ranked #2 on @lmsysorg chat arena (and #1 for long context). More work to do, but excited we put this model into the hands of developers. The era of truly multimodal models has arrived 🚀
Try Gemini 1.5 Pro for free in AI Studio with 1M context + video + audio: aistudio.google.com/app/prompts/ne…
More exciting news today -- Gemini 1.5 Pro result is out! Gemini 1.5 Pro API-0409-preview now achieves #2 on the leaderboard, surpassing #3 GPT4-0125-preview to almost top-1! Gemini shows even stronger performance on longer prompts, in which it ranks joint #1 with the latest…
People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.
Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.
Meta has 350,000 H100 equivalents, is using approx 35,000 of them to train Llama models, rest for running them (and other models but you get it) Open sourcing Llama means that if the community gets Llama 10% more efficient then the training is basically free. #AImath
Better Synthetic Data by Retrieving and Transforming Existing Datasets repo: github.com/neulab/prompt2… abs: arxiv.org/abs/2404.14361
Summing up Parquet: - efficient for streaming one row group at a time (choose the row group size wisely !) - efficient for parallel processing - efficient for filtering -> Great for DataLoaders -> Also a good input format for data processing frameworks 🤗
6/ (another bonus) Weighting Some subsets of FineWeb are of better quality than others, you can load multiple subsets separately and interleave them together using sampling probabilities: huggingface.co/docs/datasets/…
4/ Why it's amazing - Crazy fast download thanks to high, column-specific compression ratios - Load only one row group at a time in memory - Stream multiple files in parallel using multiple workers - Only load the columns you want - Efficient filtering using Parquet metadata
I'm honored to share that Criteo has been recognized at the SBR Technology Excellence Awards 2024, winning in the AI - Advertising category for our DeepKNN technology that is powering the Commerce Media Platform! Heartfelt thanks to all my colleagues and ex-colleagues at Criteo!
Criteo Singapore wins the AI - Advertising category for its Commerce Media Platform. Powered by advanced AI and DeepKNN technology, the platform enables precision targetting and optimisation. Read: bitly.ws/3iirG #SBRTechExcellenceAwards
It's a great week for open source AI! Data is among the highest impact work to push the field forward. Bravo to 🤗
Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…
You thought LLM chatbots required a lot of compute? That's cute. It's when fully-generative TikTok/YouTube hits the mainstream that you'll start needing a *lot* of GPUs. Orders of magnitude more compute, both because the medium is more intensive and because the audience will be…
15T tokens DataLoader, you're welcome
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Pretty cool work! They create data to learn a model that scores tileability of textures, and then use that to improve tileable texture generation among other things. Looks like a nice, pretty complete work on the topic👌 (Making tileable textures is a pain!!)
‼️ New paper update ➡️ TexTile: A Differentiable Metric for Texture Tileability 📜arxiv.org/abs/2403.12961 🇺🇸 Accepted @CVPR 2024! 🧑💼Coauthored with @dancasas @ele_g2 @RENDR3D