Small Data SF: Think Small, Build Big @smalldatasf
Rethinking AI & data from the ground up. A community for those building smarter AI and data systems with real workloads, not petabytes.smalldatasf.com SFJoined June 2024
Sometimes you need to crash, burn, and refactor to really see the value in “small first.” Scott Haines will unpack four hard lessons where embracing smallness didn’t just rescue projects, but led to better product, better performance, better dev happiness.
If you want avoidable…
What if instead of chasing larger models, we chased smarter ones—models that do more with less, generalize better, and are easier to deploy? Shelby Heinecke will share what "smaller models" truly mean and how they unlock impact in real settings.
If you're building AI and worried…
We ran a bench mark of DuckDB vs Spark and found that data sets under 20 GBs ran about 100 times faster on DuckDB than they did on Apache Spark!
You don't need a multi-node cluster for your smaller data sets!
This benchmark uses plain parquet files and COUNT distinct to truly…
Are you ready to explore when Apache Spark might not be the best tool for your data projects? Join us at Small Data SF on November 4 and 5 in San Francisco for an insightful talk by Holden Karau, a prominent figure in the world of big data.
Small data means breaking existing paradigms, simplifying, speeding things up and lowering costs
This insight from #smalldatasf 2024 is just a taste of what's coming in 2025!
🚀 Small News!
Estuary is a Gold Sponsor of @smalldatasf 2025, happening Nov 4-5!!
Two days of hands-on workshops, talks, and community, all centered around efficient, local-first development and smarter ways to work with data and AI.
See you there!
#smalldatasf
"Don't duck up the numbers: Where AI hype meets BI reality." This is going to be a fun panel with Barr Moses (Monte Carlo), Barry McCardel (Hex), Colin Zima (Omni) and Tristan Handy (dbt). Join us!
Dr Shelby Heinecke, who leads AI research at Salesforce, will speak at Small Data SF November 5th on how small models don't need more parameters, they just need better data. She'll speak about the highly efficient xLAM family of small action models her team built at Salesforce.
Miss Small Data SF in 2024? Catch out our highlight reel below and learn why one attendee said: "Small Data SF was on another level. The lineup was unbeatable, the content was razor-sharp, and the people were next-level inspiring."
Small Data SF may be a small conference, but the density of talent amongst the speakers and attendees is unmatched. Learn from industry luminaries, on stage and in the audience, on how and why to make your data and AI stack more efficient.
Last year, Wes McKinney of pandas fame, spoke about how people have written a lot of Spark code, making our industry sticky to Spark. This year, we're joined by Holden Karau, Spark PMC member and author of a number of Spark books, talking about when *not* to use Spark.
George Fraser, CEO of Fivetran, has been a pioneer in advocacy around small data. Watch his thoughts from a panel at Small Data SF 2024. This year he returns November 5th to share more wisdom in a talk, alongside many other data luminaries.
Guess who’s back… back again at Small Data SF. 🎤 The data Jedi, Benn Stancil is returning 💎 Missed his epic talk last year? Here’s your chance to feel the force in action.
#smalldatasf
🎁Our Small Data, Big Gift to you!
⏳We've extended early bird pricing until Aug 8!
We know summer calendars are chaotic, so we're giving you extra time to snag these savings! Join us for 2 days of practical innovation + insights from top minds in small data & AI.
🚨 SPEAKER DROP! Small Data SF just got even more stacked! 🔥
5 incredible new speakers just joined our Nov 5 lineup -including CEOs from @_hex_tech & @getdbt , a Databricks MVP, and more O'Reilly authors!
⏰ $295 Early bird tickets end in exactly ONE WEEK
smalldatasf.com
631 Followers 355 FollowingBuilding – AI-powered DataOS | Ex Head of Data @JioCinema @DisneyPlusHS | Data, AI, GTM Analytics | Scaling data platforms & teams 🚀
1K Followers 3K FollowingOpenBabylon | Boosting Global GDP with AI for Underrepresented Languages | Chaotic Good | helping AI nerds @goatstackai ¯\(ツ)/¯
14K Followers 2K FollowingThe open data movement platform. Move data from any (un)structured sources to any (vector) DB and warehouses! Take our code 🌟 https://t.co/qT1xJ2yOhk
21K Followers 36 FollowingDuckDB is an analytical in-process SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.
1 Followers 39 Following🚀 مشروع Datanova – المستقبل يبدأ من البيانات
مشروعي يركز على البيانات (Data) وكيفاش نقدر نحولوها من مجرد أرقام إلى قرارات ذكية وفرص حقيقية. 🌍
📊 نعتمد على ا
3K Followers 3K Followinghttps://t.co/McY9Yq7p0M Founder, Head of R&D at https://t.co/YXYXmtC1hg - a GenAI data startup (stealth mode), Inventor of Astrato Engine
972 Followers 978 FollowingFather of one. @calliopeBI helping at Switchfleet and @onnessfinance former VP of engineering @taiger_CO @clarityAIEng co-founder of @devo_inc
773 Followers 6K FollowingHelping people to consume less and create more, 1% per day.
What you don't share, you lose.
Serial quiter. :P
Ready, fire, aim
477 Followers 689 FollowingProduct @ Datastrato, DataOps Initiative Lead, Terry Fox Gold Medal Recipient. Data Engineering & OSS
Bluesky: https://t.co/LBgVMpvu4D
3K Followers 974 Followingco-founder and CEO of Hex (https://t.co/hbgguInF1h / @_hex_tech) - former @PalantirTech @formationbio - personal site: https://t.co/c38nDG5Dfl
523 Followers 836 FollowingLeading innovation in agentic, efficient, and on-device AI.
ML Theory PhD @thisisUIC, Math BS @MIT
Let's build the future! 🙌🏽 🚀
3K Followers 1K FollowingFounder and CEO of Prisma. If you are building for a global audience, you should give @prisma a try. DMs open - please reach out.
3K Followers 2K FollowingFounder & Host of "The Ravit Show" | Head Community Evangelist | Official LinkedIn Creator | Data Community Builder | Data Professional | Media
6K Followers 315 Followingthe world's most advanced platform for collaborative analytics and data science. Create a free account 👉https://t.co/ejDVlOl1RP
4K Followers 94 FollowingOmni is a business intelligence & embedded analytics platform that empowers everyone—regardless of technical ability—to easily analyze data.
7K Followers 296 FollowingI have a blog | benn.substack
I had a job | https://t.co/PPUWbMIeFC
Join my professional network | https://t.co/f6QIgfaKSu
My janky linktree | https://t.co/5YOf0jrVq6
Fine | https://t.co/OvT9Wq4ata
11K Followers 2K FollowingGP @a16z AI x Infra. 💙 Data, AI and dev tools. Portcos: @elevenlabsio @FAL @Ideogram @mintlify @motherduck @usepylon @resend @StainlessAPI
21K Followers 36 FollowingDuckDB is an analytical in-process SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.
58K Followers 900 FollowingPrincipal Architect @posit_pbc, GP @ComposedVC, Co-founder @voltrondata. OSS: @ApacheArrow @pandas_dev @IbisData, "Python for Data Analysis" book