Open Datasets @opendatasets
Public datasets for data science and machine learning projects. Models: @modelzoo Joined July 2018-
Tweets129
-
Followers285
-
Following43
-
Likes135
Like 80-90% of AI progress is just getting data. Conversational sets, reasoning chains, diff editing, soon sequence of logs/actions…
Like 80-90% of AI progress is just getting data. Conversational sets, reasoning chains, diff editing, soon sequence of logs/actions…
TRELLIS-500K: curated 3D dataset for 3D generation tasks, sourced from Objaverse(XL), ABO, 3D-FUTURE, and others. 500k assets, filtered by aesthetic scores. Paper + tools: huggingface.co/datasets/Jeffr…
Explore over 7 billion LHC collision events from home💻 The ATLAS Experiment at @CERN has made public two years' worth of LHC data! This #OpenData release for research is the first of its kind from ATLAS, marking a major milestone for public access. atlas.cern/Updates/News/O…
We heard you like quality tokens 🪙 @ZyphraAI is pumped to release Zyda! - 1.3T filtered/deduplicated pre-training tokens - Outperforms FineWeb, Dolma, Pile, RefinedWeb equitoken - Open/permissive licensing Paper: arxiv.org/abs/2406.01981 Blog: zyphra.com/zyda HF:…
FineWeb: 15 trillion tokens!
🥳🛠️Introducing ToolBench!🤖🎉 🌟Large-scale instruction tuning SFT data to equip LLMs with general tool-use capability 🔖 We release 98k data with 312k real API calls. We also release a capable model ToolLLaMA that matches ChatGPT in tool use Github: github.com/OpenBMB/ToolBe…
We released LAION-5B, a dataset with 5.85B CLIP-filtered image-text pairs, an intuitive search engine for exploration, one click subset creation, CLIP ViT-L/14 image embs, NSFW & watermark scores (+ the models used to compute them), kNN indices, ... laion.ai/laion-5b-a-new…
The Large Labelled Logo Dataset (L3D). 770k of images from the EUIPO open registry. Paper: arxiv.org/pdf/2112.05404… Download: zenodo.org/record/5771006…
SHIFT15M: Multi-objective Large-Scale Fashion Dataset with Distributional Shifts Paper: arxiv.org/abs/2108.12992 Download: github.com/st-tech/zozo-s…
The @mozilla Common Voice initiative has released a new, expanded data set featuring 16 new languages — like Basaa and Kazakh — and 4,622 new hours of speech! 🗣🗣 foundation.mozilla.org/en/blog/mozill…
Today we are releasing the Open Buildings Dataset, a new open-source dataset containing the locations and footprints of >500M buildings with coverage across Africa, which can support numerous scientific and humanitarian applications. Read more at goo.gle/3751HVR
WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset pdf: arxiv.org/pdf/2107.09556… abs: arxiv.org/abs/2107.09556 github: github.com/deepmind/deepm… dataset of Wikipedia articles each paired with a knowledge graph
AGENT: A Benchmark for Core Psychological Reasoning The DARPA “Common Sense AI” dataset is designed to probe an agent’s understanding of key concepts in intuitive psychology much the same way we evaluate an infant’s ability to intuit what others think. arxiv.org/abs/2102.12321
Synthetic Datasets for Research on Bias in Machine Learning Paper: arxiv.org/abs/2107.08928 Data: github.com/williamblanzei…
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks Paper: arxiv.org/pdf/2107.07455… Data: github.com/yandex-researc…
Project CodeNet: ~14M code samples, 500M lines of code, 55+ programming languages research.ibm.com/blog/codenet-a… #ai #opendatasets
Today, in partnership with @StanfordSVL, we are proud to announce the iGibson Challenge on Interactive and Social Navigation for simulating robot navigation in cluttered and busy environments. Grab the dataset and submit your model solution by May 31st. goo.gle/2PW79pg
Human EEG Dataset for Brain-Computer Interface and Meditation figshare.com/articles/datas… Paper: nature.com/articles/s4159… Code: github.com/bfinl/BCI_Data…
The equivalent of open source in Software 2.0 land are open datasets. But while plenty of former exists little of high quality latter does.
The State of Open Data 2020 digitalscience.figshare.com/articles/repor… #opendata

dd fish @Dd1129Dd
0 Followers 2 Following
Renzo Cáceres Rossi @ARenzoCaceresR
255 Followers 739 Following Amante de la #tecnologia, transhumanista,#DataLover, especialista en #SSIS , adicto a la lectura y al jugo de maracuya. https://t.co/IAYqKoRfy1
Wilhelm Bentsen @WilhelmBentsen
77 Followers 2K Following
Marcelo Romeu Gonçal... @MarceloRomeuGo1
8 Followers 289 Following
Kunal Chakma @kchakma
198 Followers 703 Following Assistant Professor at National Institute of Technology Agartala, India. Research interests include Socical Media Text Analytics, IR
Market Data Insightic... @MarketDatasets
2 Followers 78 Following We specializes in sourcing and organizing vast amounts of public web data from diverse sectors to empower businesses, to make informed, data-backed decisions.
@Anibal_X @_AnibalG
2K Followers 2K Following Data Scientist 📊 | IA Enthusiast 🤖 | Software & Systems Analyst 💻 | Cybersecurity🛡️ | Pop Culture Geek 🎬 | #DataMeetsPop #Memes
Bernies Soundlab Musi... @MusicPromoterBB
76 Followers 1K Following Bernie Bolivar creator of Bernies Soundlab , The Soundlab (Music Label) Laboratory's creator of WKLAB satfmtv 💯🌐📡⛓️🛰️ [email protected]
ℝ𝕖𝕘𝕘𝕒�... @reggaebirthdays
6K Followers 3K Following If this world was mine, I'd play Rub A Dub all the time, and I'd play it in any style 🔊
Thomas E Fleming @thomasefleming
37 Followers 2K Following
Jerome Manceau @jerome_manceau
96 Followers 1K Following Ingénieur recherche et développement dans l'équipe VAADER de l'Insa de Rennes. [email protected]
albertcuy @albertcuy
2 Followers 218 Following
Fernando Perulles @Fer_Perulles
108 Followers 775 Following
Weiran Yao @iscreamnearby
279 Followers 530 Following Research Scientist Manager #SalesforceAI | CMU PhD alumni 🐕🦺 Thoughts are my own
Raul Stefan @doermindset_
0 Followers 15 Following Building the next search engine for datasets + more
Adrian Julius Aluoch @julius68603
22 Followers 119 Following Data Analyst | SQL, Python, R, Excel, Tableau, Power BI, Google BigQuery | Turning Raw Data into Actionable Insights
Alex @Dream_ben_Alex
3 Followers 132 Following
triffsman @triffsman
1 Followers 64 Following
Lara Marie Berger @lariberger
425 Followers 735 Following PhD candidate in econ @UniCologne // interested in how we can improve media markets // tweets are my own
®𝔻r$¥sŦm ʕ(⟁... @Eb3rSonCMonToY4
382 Followers 7K Following S.O.Ex-Agent SFN⟁ (⟁LoGisT⟁) (#N👁) #RealGeOp$¥sŦm🌎° #Bio-M-Ethic-Geπ-Info-Tech-IA-Weapons🌐* #DigVisuAnaly⟁ ®UL71M47UM #NeuPL⛮ 23/9/24 #OpInFac7oRy👁VÎXīŌŅÂŘ¥
Leonie Chapel @Leoniechapel
796 Followers 979 Following People who respond to my sarcasm with sarcasm ..... are my favourite! - Strawberry fields forever
xbzkkxl @xieisndjdj
4 Followers 702 Following
Trương Hải Đăng @TrngHin39664924
0 Followers 34 Following I am a person of ambition, always in pursuit of strength; intellect and physical vigor are two essential components of this strength. Time holds my utmost respe
arion das @ArionDas
833 Followers 7K Following gen ai intern @Techolution_com || research @ aiisc, usc || author @naacl || reviewer @aclmeeting, aia @COLM_conf
Astro Knight @viswanaththi
136 Followers 397 Following Data Alchemist 🔮 | Trading markets 📊 | Entrepreneur 🚀 | Professional lazy binge-watcher 🍿🛋️✨ https://t.co/5mUPIYzclR
Vishal K Ram @VishalKRam1
3 Followers 73 Following
Andaman7 @andaman7
2K Followers 2K Following Next generation ePRO, RWE, decentralized clinical trials, siteless trials to reduce costs & time... all that with a patient-driven connected PHR for patients !
Vincent Keunen @VincentKeunen
1K Followers 1K Following I'm a software engineer, entrepreneur and cancer survivor. Founder & CEO of Andaman7. #myhealthmydata Read my story at https://t.co/YWDU8lAlhs
Rasmus Groth @bliiir
568 Followers 2K Following 5 x founder. Entrepreneur and amateur cypherpunk. science, programming, crypto, electronic music, startups, economics, politics, philosophy
ithagu wa'NJEERI 🌐 @ithagu1
2K Followers 2K Following Live || Love || Entice. We don't make History, it's History that makes us.
Thomas Michael @reThomasMichael
186 Followers 356 Following Mathematician, Data Scientist, AI consultant
Mohamed Saied @dr_hypermind
68 Followers 881 Following
Kimberly Wilson🌈 @anewmewithin
2 Followers 98 Following I'm a Software Engineer and I am committed to learning web development #100Devs
Dr.Lee @Dr_Lee_JZ
40 Followers 365 Following
Mr.Benz @MrBenzWorld
79 Followers 2K Following
Anna Bianchi @AnnaBia17671743
18 Followers 208 Following I take care of promoting books and every artistic manifestation that I consider worthy of interest for its cultural and social value
DL-DS @DLDScience
387 Followers 7K Following Freedom of Religion. The Constitution. Capitalism. When people obey God, they prosper in the land.
Anton Zemlyansky @zemlyansky
164 Followers 163 Following Building open-source tools for data analysis at @statsimcom 🎲
Model Zoo @modelzoo
7 Followers 0 Following Pre-trained machine learning models. Datasets: @opendatasets
Dataverse Project @dataverseorg
5K Followers 2K Following A web application for sharing, citing, analyzing, and preserving research data - created @IQSS, @Harvard. Tweets: @iacus + @bluejeansdiva
bun.bun.🐽 @ds_bun_
19K Followers 6K Following love #pugs, lead data scientist @datafying my tweet = data science, machine learning, ai, deep learning and pug as well.
Mitchell Harley @DocHarleyMD
2K Followers 488 Following Coastal researcher and Scientia Fellow at the University of New South Wales, #CoastSnap founder and occasional jazz trumpeter
Lars Krogmann @LarsKrogmann
2K Followers 1K Following Biodiversity Research at Natural History Museum Stuttgart and University of Hohenheim taxonomy | science communication | nature conservation | parasitoid wasps
Sergey Zakharov @ZakharovSergeyN
241 Followers 171 Following ML Research Scientist @ToyotaResearch. Working on computer vision and machine learning.
/r/datasets @reddit_datasets
346 Followers 1 Following
Scottish Open Data Un... @SODU_Live
114 Followers 156 Following We're back in Aberdeen, sharing the best in Open Data, 18-19 Nov 2023
data.europa.eu @EU_opendata
35K Followers 7K Following 🇪🇺 The official portal for European data. 1+ million data sets, #DataEuropaAcademy, #EUDatathon, studies, use cases, events and news. All things #OpenData.
European Open Science... @euospp
2K Followers 158 Following European Commission High Level Advisory Group on Open Science. Opinions are from OSPP members, not the EC.
CERN Open Data @cernopendata
343 Followers 0 Following Explore more than two petabytes of #opendata from particle physics! Visualise detector collisions, start a Virtual Machine and run your own physics analyses.
Chris Gorgolewski @chrisgorgo
8K Followers 1K Following Member of Technical Staff at @anthropicai. Previously at: @GeminiApp, @GoogleAI, @googleanalytics, @kaggle, @StanfordPsych, and @MPI_CBS. Opinions are my own.
Abbas Cheddad (SMIEEE... @DrAbbasCheddad
11 Followers 22 Following Assoc. Prof. in Sweden. My research interests lie at the intersection of Image processing, Computer vision, Information security, Medical sciences &Applied ML.
Open Data - Toronto @Open_TO
8K Followers 577 Following The #opendata account for the City of Toronto | Not monitored 24/7 | Terms of use https://t.co/O7IayDteyW | email us: [email protected]
modelis AI @modelisAI
11 Followers 52 Following A machine learning algorithm walks into a bar. The bartender asks, "what'll you have?" The algorithm says, "what's everyone else having?"
AI Now @AINow6
561 Followers 1K Following News on #ai #programming #machinelearning #digitalmarketing #dataviz #nlp and more
Elliott 🔜 Neurips ... @ElliottChMiller
838 Followers 1K Following Ex Antimatter physicist, Cosplayer, and Machine Learner All I think about is lifting and VC theory.
Rossano Marcos @rossanomarcos
439 Followers 983 Following
Silvana Acosta @silacos
433 Followers 1K Following She/Her 🇺🇾 in 🇪🇸 via 🇩🇰🇸🇪 • Data Scientist, PhD Econometrics • Interested in inclusion, diversity, interpretability & communication • Views my own
Khajan Pandey @khajanpandey
104 Followers 362 Following Passionate about Learning | Politics | Swimming , checkout my channel https://t.co/rQmr9apdec
福田@次は未定 @fukutriathlon
421 Followers 433 Following トライアスリートのITエンジニアです。トライアスロンについてとか、IT関連のツイートしてます。SCOTT のSpeedster25 Japan limitedに乗ってます。
Harold W. Taylor @HaroldWTaylor2
372 Followers 4K Following I think in SQL, dream in Python, practice data-driven origami, and imagine myself as a node of high vertex degree in a vast, directed property graph. #FedEx
Nicolas Terpolilli @nterpo
2K Followers 434 Following Head of engineering @Opendatasoft | create the best data experiences
Alberto Forero @donkeykonguk
351 Followers 4K Following UX designer, genetic genealogist (esp. Ireland and Colombia), data visualizer, digital humanist, fan of the obscure ☀️♉︎ 🌙♌︎ ⬆️♋︎
Himanshu Sharma @Himanshu_ekaant
140 Followers 1K Following Building: Ekaant - AI-based wellbeing Platform | Data Analyst| Member Core Committee - BEA | Youth Coordinator - EAI | Project Co-ordinator ( AI-CoE).
Juan Maroñas @jmaronasm
112 Followers 342 Following Probabilistic Machine Learning -- I like growing plants
altdatamine @altdatamine
60 Followers 356 Following I've been a data janitor (data engineer) for the last 20 years. Check out my blog at https://t.co/T0VVdjHpIy
Mike Hayes @hayesm42
190 Followers 415 Following Diag'd w Interstitial Lung Disease (ILD) in May 2010. Rcv'd Double Lung Transplant in Dec 2011. Pro-Allergy awareness.
Tony GUIDA @TonyGUIDA_Quant
1K Followers 4K Following Machine Learning and Quant Expert Lecturer in Quant Finance and ML #MachineLearning #DeepLearning #systematic #Quant #Ai #GraphKnowledge #GeneticAlgo #KNOWLEDGE
James Yang @james_ya_
113 Followers 431 Following 死宅/撸代码/Python/machine learning/deep learning/霉粉/二次元/强迫症/选择困难症/中抑郁/just so so
Satya Prakash Sharma @Prakash95Satya
15 Followers 144 Following Mobile App Developer (Android, Flutter) and Interested in ML,Data Science, AI.
Robson Martinho @professeur_x
100 Followers 2K Following Travel far. See beyond. Artificial Intelligence, Analytics, Machine Learning enthusiast | Currently @santander_br | Formerly @itau @citibank @philips
Stephen F. Oladele @stephenoladele_
755 Followers 2K Following Helping AI & deep tech companies 🚀 increase revenue and product adoption. How? Great at communicating technical products and ideas to their users ➕ prospects.
George W. Kennedy @gwkennedyiii
175 Followers 544 Following Engineer. Technologist. Grappler. Working to improve how people live, work, and play.
RoomTemperatureTakes @RoomTakes
112 Followers 2K Following Reasonable, data driven sports takes. https://t.co/wqBnWr5UWi From the minds of @deshrana and @konks7
Leora Wenger @leoraw
4K Followers 4K Following Highland Park, NJ artist, Jewish holiday preparer, mom, wife, web developer 🇺🇸🇮🇱