Pedro Ortiz Suarez @pjox13
Senior Research Scientist at the Common Crawl Foundation. Weird coffee person ☕️, runner 🏃🏻♂️. (he/him) 🇫🇷🇪🇺🇨🇴 portizs.eu Paris, France Joined August 2009-
Tweets831
-
Followers632
-
Following785
-
Likes5K
Come and join the new @CommonCrawl Discord server! 🧑💻
Come and join the new @CommonCrawl Discord server! 🧑💻
Are you a fan of both Common Crawl and Discord? If so, join the Common Crawl Foundation's new Discord server! discord.com/invite/njaVFh7…
Ran the Paris Marathon yesterday. It was an amazing experience. Getting into running was probably the best decision I’ve made in recently. It has helped massively with both physical and mental health. I highly recommend any type of physical activity, especially for researchers 🏃🏻♂️
Following the release of the main crawl, the Web Graphs for September/October, November/December 2023 and February/March 2024 are out! 🥳🚀🕸️ Once again, doing this with my colleague @thomvaughan was an amazing learning experience. 🤓 Let us know if you have any feedback! 📝
Following the release of the main crawl, the Web Graphs for September/October, November/December 2023 and February/March 2024 are out! 🥳🚀🕸️ Once again, doing this with my colleague @thomvaughan was an amazing learning experience. 🤓 Let us know if you have any feedback! 📝
The February/March 2024 main crawl is out! 🥳🚀📚 It was an amazing learning experience to do this bimonthly crawl with my colleague @thomvaughan. Please do let us know if you have any feedback! Happy data crunching! 🧑💻
The February/March 2024 main crawl is out! 🥳🚀📚 It was an amazing learning experience to do this bimonthly crawl with my colleague @thomvaughan. Please do let us know if you have any feedback! Happy data crunching! 🧑💻
Meet Occiglot: A Large-Scale Research Collective for Open-Source Development of Large Language Models by and for Europe marktechpost.com/2024/03/08/mee… #ArtificialIntelligence #DataScience #LLMs @occiglot
Join our new initiative, 🚨Occiglot🚨 a large-scale research collective for open-source development of LLMs by & for 🇪🇺. huggingface.co/occiglot occiglot.github.io/occiglot/posts… @BMBF_Bund @BMWK @HMWK_Hessen @DFKI @Hessian_AI
On this new blog post we take a brief look at the archiving file formats used by the Common Crawl Foundation, we provide some examples and suggest some tools that you can use to work with them! 📚🤓📝
On this new blog post we take a brief look at the archiving file formats used by the Common Crawl Foundation, we provide some examples and suggest some tools that you can use to work with them! 📚🤓📝
A very interesting post about ML Opt–Out Protocols by my colleagues Alex Xue and Julien Nioche. Definitely worth a read! 📄🤓
A very interesting post about ML Opt–Out Protocols by my colleagues Alex Xue and Julien Nioche. Definitely worth a read! 📄🤓
Common Crawl Foundation is happy to join NIST's new U.S. AI Safety Institute Consortium in support of efforts to create safe and trustworthy AI. Learn more: nist.gov/artificial-int… x.com/nist/status/17…
Common Crawl Foundation is happy to join NIST's new U.S. AI Safety Institute Consortium in support of efforts to create safe and trustworthy AI. Learn more: nist.gov/artificial-int… x.com/nist/status/17…
Common Crawl is hiring! linkedin.com/jobs/view/3813…
Sasha Luccioni, PhD �.. @SashaMTL
19K Followers 4K Following AI & Climate @HuggingFace, Board Member of @WiMLworkshop and @ClimateChangeAI. @techreview 35 Innovators under 35, @TEDTalks speaker. She/her/Dr/ 🦋Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceLuca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/themJulien Chaumond @julien_c
47K Followers 1K Following Co-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @Polytechniquemerve @mervenoyann
56K Followers 4K Following open-sourceress at @huggingface 🧙🏻♀️ proud mediterrenean 🍋 I do TL;DR on ML papers sometimes. RTs != endorsementsClémentine Fourrier .. @clefourrier
3K Followers 301 Following Leaderboards & evals research @HuggingFace 🐍✨ "The future is already here, it’s just not very evenly distributed" (Gibson)Djamé.. @zehavoc
6K Followers 3K Following Associate professor in NLP, engaged citizen. Tweeting about work, life and stuffs that I care about. All my tweets can be used freely. Personal account.MMitchell @mmitchell_ai
80K Followers 1K Following Interdisciplinary researcher focused on shaping AI towards long-term positive goals. ML & Ethics. Same content in the Sky, Threads, & the Prehistoric ElephantSara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Omar Sanseviero @osanseviero
31K Followers 2K Following Chief Llama Officer @huggingface 🦙 Founder @AI_Learners. Xoogler (SWE @Google Assistant, 20% PM TF Graphics). 100% Hacker Llama🇵🇪🇲🇽Kelly Marchisio (St. .. @cheeesio
1K Followers 558 Following Member of Technical Staff @cohere. Formerly: PhD @jhuclsp Alexa Fellow @amazon dev @Google MPhil @cambridgenlp EdM @hgse 🔑🔑¬🧀 (@kelvenmar20)Tristan Thrush @TristanThrush
3K Followers 761 Following PhD-ing @StanfordAILab @stanfordnlp. Advisor @PlaytestAI. Past: @ContextualAI, @huggingface, @Meta FAIR, @mitbrainandcog, @MIT_CSAIL, @NASAJPLAntonis Anastasopoulo.. @anas_ant
3K Followers 2K Following Assist. Prof at George Mason CS #nlproc MT, ASR, and documentation of endangered languages.Albert Villanova @avillanovamoral
2K Followers 5K Following ML Engineer @huggingface. Data Scientist, PhD Theoretical Particle Physics, BSc Computer Science. Always learning. he/himInria Paris NLP (ALMA.. @InriaParisNLP
2K Followers 218 Following Twitter account of ALMAnaCH, the Inria Paris NLP research team. @[email protected]Shaily @shaily99
5K Followers 2K Following PhD @LTIatCMU Prev: @GoogleAI @MSFTResearch. Working on ethics and evaluation in #NLProc. Usually ranting, often about research & DEI. 📚 @readsndrantsSaulnier Lucile @LucileSaulnier
4K Followers 432 Following AI Specialist @ Mistral AI | Former ML @ Hugging Face | ENS Paris-Saclay (MVA) | Centrale ParisMiryam de Lhoneux/ @m.. @mdlhx
2K Followers 1K Following #NLProc assistant prof @CW_KULeuven, PI @lagom_nlp. I like syntax more than most people. Also multilingual NLP, interpretability, mountains and beer. (She/her)Bonaventure F. P. Dos.. @bonadossou
3K Followers 662 Following PhD Student @mcgillu @McGill_NLP | 🪑@WiNLPWorkshop | Research @Mila_Quebec @MasakhaneNLP @lelapaai | Co-founder @lanfrica | ex @GoogleAI @RocheCanadaZeerak@{mastodon,bsky.. @ZeerakTalat
3K Followers 3K Following Past: @SFU_DDI @SheffieldNLP @CoastalCPH. Reluctant Machine Learner. Researching hate online & AI ethics. Organising @woahworkshop.Pensé FFun @inftyCategory
113 Followers 6K FollowingNathan Schneider @complingy
4K Followers 1K Following Computational Linguist and Professional Nerd at Georgetown University he/him pronouns, ALL the prepositions @[email protected] @complingy.bsky.socialChuanming @ChuanmingLiu
234 Followers 4K Following Ex-PhD student and alumni @sjtu1896 . Global citizen. Bootstrapping silicon-based life.Jackqueline Debrecht @JDebrecht9243
10 Followers 3K FollowingNeta Caruthers @CaruthersN67060
67 Followers 5K FollowingAI World News🤖💕 @TheWorldNews
56K Followers 9K Following The first rule of AI club is, We Don't Talk About AI Club! The second rule of AI Club is......Hyun Handon @hy_hand
41 Followers 5K FollowingEiso Kant @eisokant
7K Followers 1K Following Co-founder & CTO @poolsideai w/ @jasoncwarner “The best way to predict the future is to invent it.” - Alan Kay Prev: Athenian & source{d}Laurent Ach @ach3d
441 Followers 562 Following CTO, https://t.co/S7FTM8wYNA, interested in machine learning and the essential differences between artificial and human intelligenceNédey Oriane @NedeyOriane
6 Followers 75 FollowingJordan Swanstrom @jord_swanstr
32 Followers 5K Followingmehdi cherti @mehdidc
293 Followers 739 Following PostDoc at Jülich Supercomputing Center (JSC), Germany / LAION.Bobbye Pontoriero @BobbyePont75392
71 Followers 5K FollowingJoann Klages @JoaKlag
17 Followers 2K FollowingAilish Voelz @voe_ail
42 Followers 5K FollowingAntoine Bourgois @Antoine_Brgs
21 Followers 172 FollowingMary thomas @janetthomaary
11 Followers 446 FollowingPalma camara @CamaraPalm44122
41 Followers 3K Following Be the reason someone smiles. Be the reason someone feels loved and believes in the goodness in people be kind to others treat them with kindness😭🙏Christopher Klamm @ck.. @chklamm
747 Followers 1K Following 👨🔬 NLProc & CompPolSci @dwsunima (UMannheim) 🚀 https://t.co/srZLDZrCas co-organizer 🎓 MSc CS & MA PolSci @TUDarmstadt 👣 prev. @CompPolCologne, @UKPlab & @COSS_ethDFKI @DFKI
14K Followers 1K Following Deutsches Forschungszentrum für Künstliche Intelligenz / German Research Center for Artificial Intelligence. #KI #AI 👩💻🤖👾 Autor:innen: https://t.co/phgUaAewkFKristian Kersting @kerstingAIML
5K Followers 2K Following #AI prof @TUDarmstadt, co-director @Hessian_AI, @DFKI, @RealAAAI Councilor, @vision_claire, @ELLISforEurope, AI Columnist @WELTAMSONNTAGEnio Fernandes @bob_123456789__
42 Followers 677 FollowingKai Stpeters @stpeter_k
44 Followers 5K FollowingHarsh Desai @dreamerharsh
1 Followers 3K FollowingPEDRO ORTIZ TAMAYO @PEDROORTIZ1729
0 Followers 22 FollowingAlexandra Chronopoulo.. @alexandraxron
1K Followers 357 Following Research Scientist @Google (Bard/Gemini team) Previously PhD student @LMU_Muenchen | intern @GoogleDeepMind @allen_ai @AmazonScienceThom Vaughan @thomvaughan
60 Followers 173 Following OSS Software Engineer working in the field of Open Web DataGeralyn Coltey @geraly_col
43 Followers 5K FollowingNorah Fahnestock @fahnesto_nor
53 Followers 5K FollowingMarcell Krumwiede @KrumwiedMarce
35 Followers 5K FollowingNoemie Solazar @n_solaz
30 Followers 5K Following@jbarre.bsky.social @JeanBarre_
88 Followers 140 Following PhD student at @ENS_ULM in the CLS field. Working on literary evolution of novel subgenres & canonization process + Fr-BookNLP Implementation w/ @Lab_LATTICEلابه @be_labe_goftam
19 Followers 432 FollowingGil Elbaz @gilelbaz
6K Followers 2K Following @TenOne10 @CommonCrawl @Foursquare - Previously @Factual, ASI @AdSense, @GoogleHugo Laurençon @HugoLaurencon
563 Followers 182 Following ML research engineer @huggingface Les yeux rivés sur la lossSimon Ostermann @Simon_Ostermann
55 Followers 144 Following Lab Manager & Senior Researcher @DFKI. Musician. #NLProc Interested in Explainable and Efficient NLP... and Guitars.Rajkumar @RajKumar_RRK
331 Followers 1K Following Research Scientist. Working in the intersection of Reinforcement Learning and Natural Language ProcessingAviya Skowron @aviskowron
333 Followers 477 Following they/them. Head of Policy and Ethics @AiEleuther. Find me in the EleutherAI Discord to chat. Always looking for ways to weave philosophy into my job.Jaan Lı 李 PhD (e/d.. @thejaan
2K Followers 2K Following Large language models for health @ https://t.co/4P45Rx3687 Visiting Prof @unitartu 🎓 @Princeton @McGill ex-@DeepMind & @GoogleAI ✉️: [email protected]Lalith Manjunath @lalith140
13 Followers 189 Following(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingSasha Luccioni, PhD �.. @SashaMTL
19K Followers 4K Following AI & Climate @HuggingFace, Board Member of @WiMLworkshop and @ClimateChangeAI. @techreview 35 Innovators under 35, @TEDTalks speaker. She/her/Dr/ 🦋Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceLuca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/themJulien Chaumond @julien_c
47K Followers 1K Following Co-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @Polytechniquemerve @mervenoyann
56K Followers 4K Following open-sourceress at @huggingface 🧙🏻♀️ proud mediterrenean 🍋 I do TL;DR on ML papers sometimes. RTs != endorsementsClémentine Fourrier .. @clefourrier
3K Followers 301 Following Leaderboards & evals research @HuggingFace 🐍✨ "The future is already here, it’s just not very evenly distributed" (Gibson)Djamé.. @zehavoc
6K Followers 3K Following Associate professor in NLP, engaged citizen. Tweeting about work, life and stuffs that I care about. All my tweets can be used freely. Personal account.MMitchell @mmitchell_ai
80K Followers 1K Following Interdisciplinary researcher focused on shaping AI towards long-term positive goals. ML & Ethics. Same content in the Sky, Threads, & the Prehistoric Elephantclem 🤗 @ClementDelangue
90K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersSara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Omar Sanseviero @osanseviero
31K Followers 2K Following Chief Llama Officer @huggingface 🦙 Founder @AI_Learners. Xoogler (SWE @Google Assistant, 20% PM TF Graphics). 100% Hacker Llama🇵🇪🇲🇽Hugging Face @huggingface
343K Followers 189 Following The AI community building the future. https://t.co/VkRPD0VKaZ #BlackLivesMatter #stopasianhate@emilymbender@dair-co.. @emilymbender
58K Followers 2K Following Prof, Linguistics, UW // Faculty Director, CLMS // she/her // @[email protected] & bsky // rep by @ianbonaparteKyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Stella Biderman @BlancheMinerva
15K Followers 748 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herGraham Neubig @gneubig
31K Followers 586 Following Associate professor at CMU, studying natural language processing and machine learning.Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.LISN @LisnLab
3K Followers 161 Following Laboratoire Interdisciplinaire des Sciences du Numérique• •CNRS • Université Paris-Saclay • Inria • CentraleSupelec• Fondé en 2021 par le LIMSI et le LRIAniello De Santo @ani.. @AnyDs
1K Followers 1K Following The mystery of the Universe is its comprehensibility. he\him. Not really here anymore, email or find me on the sky app. (pfp cc. Dustin Nguyen).Nathan Schneider @complingy
4K Followers 1K Following Computational Linguist and Professional Nerd at Georgetown University he/him pronouns, ALL the prepositions @[email protected] @complingy.bsky.socialDesmond Elliott @delliott
3K Followers 440 Following Assistant Professor at the University of Copenhagen working on multimodal machine learning.Nédey Oriane @NedeyOriane
6 Followers 75 FollowingAsad Sayeed @asayeed@.. @asayeed
2K Followers 832 Following Computational psycholinguist in @OfClasp, Senior Lecturer at the University of Gothenburg. Decided to become thought leader. Ready to lead your thoughts.Laurent Ach @ach3d
441 Followers 562 Following CTO, https://t.co/S7FTM8wYNA, interested in machine learning and the essential differences between artificial and human intelligencemehdi cherti @mehdidc
293 Followers 739 Following PostDoc at Jülich Supercomputing Center (JSC), Germany / LAION.Conference on Languag.. @COLM_conf
2K Followers 6 Following https://t.co/GhGCMEoa4A Abstract submission: March 22, 2024Benjamin Minixhofer @bminixhofer
716 Followers 354 Following PhD Student @CambridgeLTL / prev @jkulinz @cohere @huawei @h2oaiLeonie Weissweiler @LAWeissweiler
788 Followers 314 Following Visiting Researcher with @adelegoldberg1 at @Princeton | prev. @cislmu @LTIatCMU @CambridgeLTLChristopher Klamm @ck.. @chklamm
747 Followers 1K Following 👨🔬 NLProc & CompPolSci @dwsunima (UMannheim) 🚀 https://t.co/srZLDZrCas co-organizer 🎓 MSc CS & MA PolSci @TUDarmstadt 👣 prev. @CompPolCologne, @UKPlab & @COSS_ethKristian Kersting @kerstingAIML
5K Followers 2K Following #AI prof @TUDarmstadt, co-director @Hessian_AI, @DFKI, @RealAAAI Councilor, @vision_claire, @ELLISforEurope, AI Columnist @WELTAMSONNTAGMyrthe Reuver // sigm.. @myrthereuver
1K Followers 2K Following PhD candidate #NLProc @CLTLVU 📚 😊 || Computational Arguments, ethics, interdisciplinarity, cats || I express my own viewsAstral @astral_sh
3K Followers 0 Following High-performance developer tools for the Python ecosystem, starting with Ruff, an extremely fast Python linter, written in Rust.Thom Vaughan @thomvaughan
60 Followers 173 Following OSS Software Engineer working in the field of Open Web DataFirefox 🔥 @firefox
2.5M Followers 529 Following The only non-profit-backed, people-first browser. Help: @FirefoxSupportMozilla @mozilla
279K Followers 4K Following We work to ensure the internet remains a public resource that is open and accessible to all. The nonprofit behind @Firefox. #BlackLivesMatterUbuntu @ubuntu
624K Followers 1K Following Ubuntu is an open source software operating system that runs from the desktop, to the cloud, to all your internet connected things.Canonical @Canonical
99K Followers 569 Following Secure enterprise open source for the world. We're also the publisher of @Ubuntu, the most popular and reliable Linux OS. #AI #MLOps #Robotics #IoT #CloudThe Linux Foundation @linuxfoundation
542K Followers 10K Following A nonprofit organization enabling mass innovation through open source. #linux #kubernetes #riscv #hyperledger #anuket #openssf #openjs #o3de and more!DuckDuckGo @DuckDuckGo
2.8M Followers 4 Following Independent internet privacy company. Download our browser on mobile and desktop. Unlike Chrome, it has privacy built in, including our private search engine.Alexandra Chronopoulo.. @alexandraxron
1K Followers 357 Following Research Scientist @Google (Bard/Gemini team) Previously PhD student @LMU_Muenchen | intern @GoogleDeepMind @allen_ai @AmazonScienceRWKV @RWKV_AI
2K Followers 3 Following AI model built by the community, for everyone in this world Part of the Linux Foundation, Apache 2 licensed An RNN scaled to 14B params with GPT-level of perfDavid Kanter @TheKanter
4K Followers 199 Following Executive Director @MLCommons making machine learning better for everyone. @MLPerf CPUs, computer architecture, semiconductors, graphics, economics, writes @RWTMLPerf @MLPerf
581 Followers 2 Following Building fair, useful, industry-standard benchmarks for ML training and inference performance of hardware, software, and services from TinyML to supercomputersMax Bartolo @max_nlp
2K Followers 787 Following I lead the Command modelling team at @Cohere and co-chair the @DynabenchAI @MLCommons working group. Prev @DeepMind, @MetaAI / FAIR & @BloomsburyAI.Mediapart @Mediapart
3.1M Followers 2K Following Journal d'information numérique, indépendant et participatif. Seuls nos lecteurs peuvent nous acheter ! Suivez aussi @MediapartBlogs.Gil Elbaz @gilelbaz
6K Followers 2K Following @TenOne10 @CommonCrawl @Foursquare - Previously @Factual, ASI @AdSense, @GoogleHugo Laurençon @HugoLaurencon
563 Followers 182 Following ML research engineer @huggingface Les yeux rivés sur la lossSimon Ostermann @Simon_Ostermann
55 Followers 144 Following Lab Manager & Senior Researcher @DFKI. Musician. #NLProc Interested in Explainable and Efficient NLP... and Guitars.Carlos Andrés Osorio @CarlosAOsorioA
268 Followers 714 Following Esta biografía está en construcción.Lalith Manjunath @lalith140
13 Followers 189 FollowingThomas Wang @thomas_wang21
112 Followers 78 Following Research Engineer @MistralAI | previously @huggingface @nablatechNora Belrose @norabelrose
8K Followers 124 Following Working toward a free and fair future powered by friendly AI. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.htmx.org / w4c enthus.. @htmx_org
44K Followers 230 Following high power tools for html - ʕ •ᴥ•ʔ made in montanaNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Yuval Pinter @yuvalpi
2K Followers 491 Following #NLProc researcher at @bengurionu, visiting academic at @amazonscience, PhD from @gtcomputing. 🇮🇱. Subtweets are not endorsements. en/he/himExtraLouise @ExtraLouise
6K Followers 95 Following Suivez ici le voyage d'Extra Louise pour changer le regard sur la #trisomie 21 Follow here the journey of Extra Louise to change the face of #downsyndromeEthan Gotlieb Wilcox @weGotlieb
924 Followers 413 Following Postdoc at ETH Zurich. Formerly PhD student at Harvard Linguistics, affiliated with MIT Brain & Cog Sci. Language, Computers, Cognition.Katherine Maher @krmaher
32K Followers 2K Following Web Summit boss lady. Previously CEO @Wikimedia. Now 🤔 various places. Human curiosity, generosity, and dignity. She/her.Philipp Schmid @_philschmid
16K Followers 651 Following Tech Lead and LLMs at @huggingface 👨🏻💻 🤗 AWS ML Hero 🦸🏻 | Cloud & ML enthusiast | 📍Nuremberg | 🇩🇪 https://t.co/l1ppq3q3hkQuanta Magazine @QuantaMagazine
323K Followers 657 Following Illuminating math and science. Supported by @SimonsFdn. 2022 Pulitzer Prize in Explanatory Reporting.**A Partager !! ** Journée Femmes en Math et Informatique le 21 maj, pour tous les étudiants et étudiantes, organisée par moi-même à @UnivParisSaclay avec mon labo le @LisnLab et la GS Informatique ⏬⏬⏬ lisn.upsaclay.fr/evenements/jou…
I'm sorry to throw oil to the fire here, but this price is really ridiculous for students, and I can imagine will prevent many from attending when they are the sole author that can go. Why not raise the price for industry attendees instead? #NLProc
NAACL 2024 seems to charge $750 for students to register if they're a presenter (every paper requires at least one registered presenter). @naacl am I reading this right? Seems like a major burden on students, especially if (as is common) only a paper's student authors attend.
@BoseShamik @naaclmeeting @naacl i’m reading this as paper needs full registration ($750), but can be presented by a student ($250).
Seems wrong to charge students full price if they have a paper accepted, but 1/3 of that if not. I know this is a common way to make sure conference recoups costs, but tough pill to swallow. Perhaps @naaclmeeting @naacl can reconsider this? 🙏
NAACL 2024 seems to charge $750 for students to register if they're a presenter (every paper requires at least one registered presenter). @naacl am I reading this right? Seems like a major burden on students, especially if (as is common) only a paper's student authors attend.
Excited to introduce MAD Speech: a new set of metrics to measure acoustic diversity in speech. Work done @GoogleDeepMind w/ @_andrea_agos, @MTagliasacchi, @neilzegh and @n0mad_0 Paper link: arxiv.org/abs/2404.10419 1/5
NAACL 2024 seems to charge $750 for students to register if they're a presenter (every paper requires at least one registered presenter). @naacl am I reading this right? Seems like a major burden on students, especially if (as is common) only a paper's student authors attend.
It's the third day of my postdoc and the third day of Codemas! What's in the box today?
gpu poor but luxury rich
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.
The GPT4 of datasets took down Hugging Face, sorry all 😅😅😅
How streaming a 15T tokens DataLoader work ? small 🧵
In honor of Passover, I'd like to share my Google Maps review of the Sinai Peninsula from five years ago
Like this is highly unreasonable. What field is this, biology? Replicating the awful megalab structure? Wonderful... tongliang-liu.github.io/groups.html
After going through some of these author names and trying to figure out what's up, I'm pretty positive many of these faculty just have labs that are way too large to actually attend to their students and their projects. ML needs to calm down tbh
I deanonymized it. Some care needs to be taken to disambiguate names, but I think this is mostly correct up to ties. Levine submitted 22 papers more than half were rejected. A lot of other expected names.
15T tokens DataLoader, you're welcome
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Really amazing work by the @huggingface team! Infrastructure work, including dataset work, evaluations work, and building libraries, is the single highest-leverage thing you can do in AI. This will provide dividends for the broader AI community for years to come.
Data is all we need! 👑 Not only since Llama 3 have we known that data is all we need. Excited to share 🍷 FineWeb, a 15T token open-source dataset! Fineweb is a deduplicated English web dataset derived from CommonCrawl created at @huggingface! 🌐 TL;DR: 🌐 15T tokens of cleaned…