Common Crawl Foundation @CommonCrawl
Common Crawl is a non-profit foundation dedicated to the Open Web. commoncrawl.org San Francisco, CA Joined February 2010-
Tweets1K
-
Followers8K
-
Following2K
-
Likes582
“When only a few have the resources to build and benefit from AI, we leave the rest of the world waiting at the door,” said @StanfordHAI Senior Fellow @YejinChoinka during her address to the @UN Security Council. Read her full speech here: hai.stanford.edu/policy/yejin-c…
Common Crawl Foundation Opt-Out Registry Publishers have been sending Common Crawl legal opt-out requests. In the interest of transparency and to better serve our ecosystem, we are publishing the full opt-out list for every legal request we have received. Why We're Doing This…
On October 22, the Common Crawl team will lead a seminar at Stanford HAI. Our topic of discussion is “Preserving Humanity's Knowledge and Making it Accessible: Addressing Challenges of Public Web Data”. Space is limited for those attending in-person. There is also an option to…
My latest article "Train-Set SEO: Why Embedding Your Brand in AI’s DNA is the Future of Search Optimization" just got published on @hackernoon 💚💚 hackernoon.com/train-set-seo-…
Thank you, Leo, Jeff and Paris, for the opportunity to talk about AI and Common Crawl on TWiT! I really enjoyed the conversation! @leolaporte @jeffjarvis @parismartineau twit.tv/shows/intellig…
We don’t give nearly enough credit to the people and organizations who build and share open AI datasets. In fact, I’d argue they matter even more than open models: - they’re foundational, enabling hundreds of different models - they remove not only technical but also legal…
We don’t give nearly enough credit to the people and organizations who build and share open AI datasets. In fact, I’d argue they matter even more than open models: - they’re foundational, enabling hundreds of different models - they remove not only technical but also legal…
Common Crawl Foundation wants to expand its language diversity. We're currently 43% English. Pedro Ortiz Suarez from our team published a paper related to this. We are excited to push this forward! arxiv.org/abs/2503.06547

Jeremy Howard @jeremyphoward
261K Followers 6K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo
Antonio García Mart�... @antoniogm
205K Followers 15K Following Director, @base Ads. Founder @spindl_xyz (acq. @coinbase). Wrote bestseller 'Chaos Monkeys'. "To fill the hour—that is happiness." גם זה יעבור 🇺🇸🇪🇸
Michael Nielsen @michael_nielsen
110K Followers 6K Following Searching for the numinous 🇦🇺 🇨🇦, currently live in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUb https://t.co/2dWwZKrvrn
Michael L. Nelson @phonedude_mln
2K Followers 956 Following Professor: @WebSciDL @ODUcs @ODUVMASC @ODUDataScience (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)
Peter Wang 🦋 @pwang
48K Followers 2K Following Chief AI & Co-founder @AnacondaInc; invented @pyscript_dev, @PyData @Bokeh @Datashader. Former physicist. A student of the human condition. bsky: @wang.social
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
ArlenePartridge @4P5Gy0S58F81UOw
13 Followers 607 Following
Dr. Eng. NADIA GHEZAI... @Nadia_GHEZing
34 Followers 810 Following
Miami AI Month @MiamiAIMonth
5 Followers 386 Following The Largest Most Accessible Distributed AI Convention. Follow us: https://t.co/0U6AU8aXCB
HildaMacPherson @4Js7qIHpGTef2ap
23 Followers 603 Following
Link Genetic @LinkGenetic
5 Followers 79 Following Link Genetic is a Swiss technology startup company that specializes in the development of intelligent content solutions.
ZenobiaCooke @Z210dr43LXe0qN
16 Followers 504 Following
Plural.sh @plural_sh
2K Followers 3K Following Self-hosted Kubernetes fleet management platform. Manage and orchestrate your Kubernetes clusters from a single interface.
HackerNoon | Learn An... @hackernoon
89K Followers 5K Following how hackers start their afternoons. where 50k+ technologists publish blog posts for 4M+ monthly readers. write your story 👉https://t.co/PGmtSCRFgn
Chukwuemeka James Chi... @Chukwuemeka_JC
2K Followers 2K Following AI/ML Engineer(IBM) || Data Scientist(WQU) ||First class honours EEE || R and D: Security, Privacy, KRR, Graphs, RAG, Agents, LLMs, DL, MLOps || Royal Pristhood
Fishon 💡 @fishon_amos
372 Followers 343 Following Building things that break things and Beta testing the future. Research: https://t.co/pPlNnHHXBu
Suzanne Giroux @SuzanneGX1111
4 Followers 37 Following
Tilman Bayer @tilmanbayer
362 Followers 394 Following data, Wikipedia, co-maintainer of @WikiResearch, @BerkeleyISchool grad
Tomas Nordberg @Tomas_Nordberg
464 Followers 7K Following
Solarpunk Monk @blessthiscode
19 Followers 218 Following where technology meets meaning, the future blooms
Aslan A. @realalijev
19 Followers 1K Following
WallisBilly @31945eFnGIZF6
45 Followers 2K Following
Crypto Junkie @iOnlySeeFreedom
860 Followers 734 Following "Imagine what it's like reimagining the internet." - https://t.co/Op5TLFU3rJ
Kirjirf @kirjirf69751
6 Followers 159 Following
MOHAMAD @Muhammed546738
8 Followers 963 Following Allah'ım, halimizi düzelt ve işlerimizi kolaylaştır.
Paul Birch @aifirstguy
3K Followers 4K Following
Jex Arion @nfsr4L
513 Followers 1K Following the world is shifting into the digital age. Hello Future . ⧦ 👑 ⚛️
buraymij @buraymij
2 Followers 467 Following
Intership @kacsdprta
0 Followers 86 Following
Ashkne 🦅☯️ @Dexter_inu
106 Followers 231 Following
Wharto @Wharto960446
10 Followers 280 Following
Feifeiforu766 (🌿�... @limclivee
0 Followers 112 Following 🌿愉快⭐ (飞非)🌿 好心情. 飞非.com.net Organisation Payments profile ID: 🌿愉快⭐ (飞非)
Ates Evren Aydinel @atesaydinel
110 Followers 327 Following If you don’t design the world, it designs you. Siz dünyayı tasarlamazsanız, o sizi tasarlar.
Akash Sharma @aks2013
298 Followers 5K Following University of Pittsburgh. Kutztown University. DeSales University.
archeofit @ArcheoFit
0 Followers 15 Following
Fahim Khan @bitwisenoise
150 Followers 520 Following Here to express my views and ask questions. RT's are not endorsement.
Paul @successtech
172 Followers 2K Following
Alex_x @Alex_ixii
88 Followers 2K Following
Ben Berman @benbfly
936 Followers 1K Following Genomics and bioinformatics researcher. California to Israel transplant. Opinions are my own.
GreyLink @Greyalienfrog
352 Followers 1K Following
Kourhar @Kourhar8572
36 Followers 2K Following
Paul Dooley @PaulDooley32136
119 Followers 3K Following
François Chollet @fchollet
576K Followers 818 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
Yann LeCun @ylecun
955K Followers 765 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
Bojan Tunguz @tunguz
253K Followers 8K Following ML ex Nvidia. Creator of @trainxgb. Data Scientist. Physicist. Catholic. Husband. Father. Stanford Alum. Memelord. e/xgb. AMDG.
Andrew Ng @AndrewYNg
1.3M Followers 1K Following Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
Jason Scott @textfiles
53K Followers 648 Following Proprietor of https://t.co/sdyjXHCZF7, historian, filmmaker, archivist, storyteller. Works on/for the Internet Archive. Rank Amateur. Pitiful Man.
GitHub @github
2.6M Followers 326 Following The AI-powered developer platform to build, scale, and deliver secure software.
Jeremy Howard @jeremyphoward
261K Followers 6K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Prev: professor @ UQ; Stanford fellow; @kaggle president; @fastmail/@enlitic/etc founder https://t.co/16UBFTX7mo
Percy Liang @percyliang
85K Followers 420 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | Pianist
Internet Archive @internetarchive
442K Followers 1K Following Internet Archive is a non-profit research library preserving web pages, books, movies & audio for public access. Explore web history via the @waybackmachine.
Michael Nielsen @michael_nielsen
110K Followers 6K Following Searching for the numinous 🇦🇺 🇨🇦, currently live in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUb https://t.co/2dWwZKrvrn
Michael L. Nelson @phonedude_mln
2K Followers 956 Following Professor: @WebSciDL @ODUcs @ODUVMASC @ODUDataScience (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)
Peter Wang 🦋 @pwang
48K Followers 2K Following Chief AI & Co-founder @AnacondaInc; invented @pyscript_dev, @PyData @Bokeh @Datashader. Former physicist. A student of the human condition. bsky: @wang.social
Jimmy Lin @lintool
15K Followers 843 Following I profess CS-ly at @UWaterloo about NLP/IR/LLM-ish things. I science at @yupp_ai and @Primal. Previously, I monkeyed code for @Twitter and slides for @Cloudera.
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
Chris Albon @chrisalbon
89K Followers 3K Following Director of ML and Data Eng @Wikimedia Foundation. We host Wikipedia.
Eliezer Yudkowsky @allTheYud
3K Followers 17 Following High-volume account of @ESYudkowsky, the original AI alignment guy. If it's missing punctuation, it's humor. If you can't tell, it's probably also humor.
Alex Vacca @itsalexvacca
31K Followers 411 Following Co-founder, ColdIQ ($6M ARR in under 2 years) | Helping B2B companies scale revenue with the best GTM systems | https://t.co/JbSDyoIlPE
Mira Murati @miramurati
371K Followers 574 Following Now building @thinkymachines. Previously CTO @OpenAI
Chukwuemeka James Chi... @Chukwuemeka_JC
2K Followers 2K Following AI/ML Engineer(IBM) || Data Scientist(WQU) ||First class honours EEE || R and D: Security, Privacy, KRR, Graphs, RAG, Agents, LLMs, DL, MLOps || Royal Pristhood
Fishon 💡 @fishon_amos
372 Followers 343 Following Building things that break things and Beta testing the future. Research: https://t.co/pPlNnHHXBu
HackerNoon | Learn An... @hackernoon
89K Followers 5K Following how hackers start their afternoons. where 50k+ technologists publish blog posts for 4M+ monthly readers. write your story 👉https://t.co/PGmtSCRFgn
Simone Scardapane @s_scardapane
12K Followers 656 Following I fall in love with a new #machinelearning topic every month 🙄 | Researcher @SapienzaRoma | Author: Alice in a diff wonderland https://t.co/A2rr19d3Nl
paris martineau @parismartineau
18K Followers 2K Following investigative journalist @consumerreports, previously @theinformation @wired / tech podcastin’ @twit / send tips: 267.797.8655 (signal) / [email protected]
Leo Laporte (twit.soc... @leolaporte
466K Followers 3K Following Follow me on Mastodon: https://t.co/nVyHmFQ8QC Podcaster and tech pundit Chief TWiT at https://t.co/2tti2EdT3y
Workshop on Multiling... @wmdqs
5 Followers 13 Following The first iteration of our workshop will be co-located with @COLM_conf 2025 in Montreal.
(((⚫️)))HGTP://Le... @henson_levi
1K Followers 980 Following Husband. Father. #HGTP Enthusiast. Alkimist. Utility is the way. All 🐥’s are my opinions. Always #DYOR 🚀
Dagnum P.I. @Dagnum_PI
31K Followers 1K Following Mapping the blockchain frontier, one utility at a time. Advocate for real-world solutions| Marketing/BD for @stardustco11ect TG Group📲 https://t.co/eheeb5bRfi
Conference on Languag... @COLM_conf
5K Followers 6 Following https://t.co/GhGCMEoHU8 Conference: October 7, 2025
Delta Think, Inc. @de... @DeltaThink
1K Followers 953 Following The scholarly community trusts Delta Think consultants to advance strategy in an ever-changing landscape
Jonathan Siddharth @jonsidd
6K Followers 4K Following Founder & CEO, Turing | Training Superintelligence
j⧉nus @repligate
59K Followers 2K Following ↬🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀→∞ ↬🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁→∞ ↬🔄🔄🔄🔄🦋🔄🔄🔄🔄👁️🔄→∞ ↬🔂🔂🔂🦋🔂🔂🔂🔂🔂🔂🔂→∞ ↬🔀🔀🦋🔀🔀🔀🔀🔀🔀🔀🔀→∞
Pitsu @Pitsu
412 Followers 2K Following Ein Kobold erforscht das digitale Universum. Ambassador for @AntiRugAgent Crypto Safety Anti-SCAM Software: https://t.co/SHLkIflmlQ
FutureHouse @FutureHouseSF
7K Followers 3 Following Philanthropically-funded moonshot building semi-autonomous AI to accelerate the pace of scientific discovery in biology.
World Labs @theworldlabs
34K Followers 35 Following World Labs is a spatial intelligence company building Large World Models to perceive, generate, and interact with the 3D world.
Fei-Fei Li @drfeifei
527K Followers 1K Following Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, #AI #SpatialIntelligence #GenAI #computervision #robotics #AI-healthcare
Markus Kliegl @MarkusKliegl
92 Followers 123 Following Research scientist at NVIDIA | Previously deep learning at Apple, DeepMind, Baidu Research
Ai2 @allen_ai
74K Followers 410 Following Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj
Travis Hoppe @metasemantic
2K Followers 282 Following post-punk scientist / previous @DCHackAndTell organizer / creator of many, many irrelevant things / fan of sedenions
Marcin Wilkowski 🏴 @marcinwilkowski
1K Followers 2K Following #digitalhistory, #webarchiving, #RStats, #Omeka, #Vue, @ckcuw @archiwumWWW
1hadoop @hadoopandrii
102 Followers 801 Following Спали вогнем життєтворчим всю кволість у серці моєму 🔥
Debangshu 🇮🇳�... @ThisIsDK999
7K Followers 999 Following Security Ninja/Thought Leader. @hacker0x01 Brand Ambassador. Top 200 | Hacker Advisory Board @bugcrowd. Captain @Str4awHats 🥷. Opinions are personal.
Christopher Klamm @chklamm
890 Followers 2K Following 👨🔬 CSS & NLP @dwsunima @CompPolCologne 🚀 https://t.co/srZLDZr4kU co-orga 🎓 MSc CS & MA PolSci @TUDarmstadt 👣 @MZESUniMannheim, @gesis_org, @UKPlab & @COSS_eth
@zhksh.bsky.social �... @zhksh
27 Followers 410 Following Computational Linguist, multilingual NLP, Infosec, DevOps. Citoyen. No longer active here. Find me on Bluesky or Mastodon, way cooler over there.
Barry Haddow @bazril
1K Followers 666 Following Researcher in Informatics at University of Edinburgh. Mainly working on machine translation.
Shayne Longpre @ShayneRedford
6K Followers 1K Following Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦 Prev: @Google Brain, Apple, Stanford. Interests: AI/ML/NLP, Data-centric AI, transparency & societal impact
coderhema @coderhema
547 Followers 4K Following 5x Hackathon winner | building something do-able & clinks, green ball mochi | AI Partner @Freepik | anyone can edit @clippityai
NVIDIA AI @NVIDIAAI
239K Followers 792 Following The latest breakthroughs and the future of AI for business leaders.
Lucas Dickey @LucasDickey4
671 Followers 2K Following Building: @promptyield | Vibe fashion @ a-ok dot shop | Founder, @deepcastfm @fernish Day 1 @amazon MP3
Emmanouil Tranos @EmmanouilTranos
1K Followers 1K Following Professor of Quantitative Human Geography University of Bristol @geogBristol | Turing Fellow. Cities, regions, digital technologies, data & models. 🏃🏔️🏂
Sara Hooker @sarahookr
50K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Lauren Wagner (in SF) @typewriters
4K Followers 1K Following building trust in AI @arcprize @abundanceinst • prev @Meta @GoogleAI @OIIOxford • 🪽@a16z
Tech Week @Techweek_
16K Followers 252 Following For founders, by founders. The biggest convos, the best collisions. Presented by @a16z | SF: Oct 6–12 | LA: Oct 13–19
Thom Vaughan @thomvaughan
71 Followers 177 Following Web infrastructure and Open Data technology specialist
Kenneth Auchenberg �... @auchenberg
16K Followers 4K Following Partner at @alley_corp, investor focused on backing founders building for developers. Past building at @stripe, @code, @microsoft and a few startups (acq)
Henry Yates 😁 @HPY
901 Followers 5K Following Co-founder https://t.co/Jp7EdOi0XX Founder https://t.co/TckwDqegCn Founder https://t.co/AiDU12cUIU