Shiyan Xu @_xushiyan
Data Architect | O'Reilly Author | Creator of Hudi-rs | PMC member of @apachehudi YouTube.com/@Datumagic Joined September 2010-
Tweets172
-
Followers179
-
Following76
-
Likes265
š [New Blog] Performing append-only write operations is quite easy in #ApacheHudi! Since v0.14 (2023-09), you donāt need to set a record key field to start writing to Hudi tables. Auto key generation lowers the barrier for getting started with a data lakehouseāperfect forā¦
[New Blog] Lakehouses are powerful, but are you doing enough to protect them from both internal and external threats? š¤ In this blog, we discussed: š”ļø Robust Authentication & Authorization: It's foundational. Ensure only the right users and applications have access to theā¦
š Excited to share that Zhenqiu Huang and @_xushiyan will be speaking at Data Streaming Summit SF 2025 (@DataStreamingSt sponsored by @streamnativeio)! Their talk dives into how @ApacheFlink + @apachehudi power streaming data ingestion šš šššš« š¬ššš„šāenablingā¦
I'm delighted to be delivering this talk at Data Streaming Summit SF 2025 hosted by @streamnativeio. Some highlights of the talk are: ā Understand the architecture and workflow of NBCC and its alignment with Flink for streaming ingestion. ā Learn how the new file layout,ā¦
āļø [Blog] Part 2 of building a RAG-based AI recommender š¤ A reliable, efficient data lakehouse isn't just a prerequisiteāit's the core component that enables RAG (Retrieval-Augmented Generation) applications to thrive. While the LLM usually gets the spotlight, the data pipelineā¦
š” ššØš® ššš§ ššš¬š¢š„š² š¦š¢š š«ššš ššØ šš®šš¢ š.š± ššØ š šš¢š§ š¦šš£šØš« š©šš«ššØš«š¦šš§šš ššØšØš¬šĀ š Steps š
š A new chapter "Running Hudi in Production" is now available in the early release of "Apache Hudiā¢: The Definitive Guide" - the first book ever written about @apachehudi, by industry experts: @_xushiyan, Prashant Wason, @SudhaSakthee, and @RebeccaBilbro. š Get a free copy ofā¦
Join this month's developer sync call with PMC member @_xushiyan on July 23, 5 PM PT to learn about the Rust implementation of @apachehudi with API bindings in Python and C++ ! āļø GitHub repo: github.com/apache/hudi-rs š Joining instructions: hudi.apache.org/contribute/devā¦
š We're happy to share that 7 chapters are now available in the early release ofĀ "Apache Hudiā¢: The Definitive Guide" - the first book ever written about @apachehudi, one that's about to change how you think about data lakes. š Get a free copy of the early release e-book:ā¦
š New Blog: Building a RAG-based AI Recommender (Part 1/2) š blog.datumagic.ai/p/building-a-r⦠š What's inside: ⦠How RAG works end-to-end (chunking ā embedding ā retrieval ā generation) ⦠Why 70% of AI success is actually data engineering š ⦠How @apachehudi 's incrementalā¦
AWS S3 Tables simplifies Iceberg tables, but we benchmarked and discovered: š 3h compaction delays š Perf degradation š¦ Limited observability āļø Minimal control šø 20ā30x costs Read the this blog from our engineering team for the details: onehouse.ai/blog/s3-manageā¦
š¦ Hudi-rs 0.4.0 is released! Another step towards standardizing @apachehudi APIs across broad ecosystem integrations using #rustlang #Python #cpp ! š¢ github.com/apache/hudi-rsā¦
Want an expert-guided, systematic way to learn @apachehudi ? Look no further! We've launched a new dedicated page on the Hudi website featuring high-quality tutorial series for learning #apachehudi ! š hudi.apache.org/learn/tutorialā¦
Today we're announcing that Eventual has raised $30M in Seed and Series A funding from @CRV and @felicis as well as @ycombinator, @M12vc and @Citi and others. The AI era needs data infrastructure built for AI, not retrofitted. š§µ
Wondering what cool features are coming in Hudi 1.1 and 1.2 ? In the last call, we discussed ā Metadata Table flow re-design and improvements ā Index abstraction Join PMC members @ethanyguo and @_xushiyan on June 18 at 5 PM PT for part 2! Use link from hudi.apache.org/contribute/devā¦
I'm super excited to launch a new blog/video series: "Apache Hudi does XYZ"! Blog š blog.datumagic.ai/p/apache-hudi-⦠Video šŗ youtu.be/hgGu9L0Qzyw About a year ago, I published a well-received blog series called "Apache Hudi: From 0 to 1" (blog.datumagic.ai/p/apache-hudi-ā¦) introducingā¦
Wondering what cool features are coming in Hudi 1.1 and 1.2 ? Join us today at 5 PM PT for the monthly #apachehudi developer sync call, where Hudi PMC member @ethanyguo will share some highlights! Check out this page for the Zoom meeting link š hudi.apache.org/contribute/devā¦
š„ Meet Quanton ā the new query execution engine from Onehouse. š Same Spark & SQL. š At least half the cost. š 1.6x-3.6x better ETL price-performance š 2.2x-6.5x better Ingest price-performance šĀ Read the full blog here: onehouse.ai/blog/announcin⦠ā¬ļøĀ Download our freeā¦

Rodrigo @rouxero
170 Followers 2K Following
Andy Walner @andywalner
272 Followers 1K Following Sharing thoughts on the data and AI landscape. Product @OnehouseHQ . Prev @Google, @DashworksAI, @UMich
xerone rasfat @Xero029
4 Followers 266 Following
a le x @alex38670991021
1 Followers 72 Following Alex@VeloDB Solution Architect@Apache Doris Contributor
SereneLaura @SereneLaur
16 Followers 464 Following š«No explicit contentš«No gift cards šŗšøUS priority āļøāļøTravel enthusiasts š¤ Cowgirl spirit, barbecue parties šLA warmth and boldness
DinahService @Zn69C4XU2JcDNNg
68 Followers 2K Following
Anh LĆ¢m @lamducanhndgv
49 Followers 980 Following
DoreenSarah @9kIt18k53pe6vG0
35 Followers 2K Following
WandaKelsen @AAd0D9849ds6L4
29 Followers 2K Following
waves @huang73922
9 Followers 141 Following
Meet Shah @Curiousmeet
2 Followers 189 Following
Tao Liu @leven199527
1 Followers 37 Following
JJCo168 @JCo41805
9 Followers 91 Following
Kai @nitorkai
16 Followers 436 Following
Ilya /Space/ Kharlamo... @ilyakharlamov
11K Followers 1K Following ex-Rocket Scientist, now-Software Engineer, Š°Š½ŃŠøŠæŠ°ŃŃŠøŠ¾Ń
Xu Jiajun @XuJiaju63488352
1 Followers 125 Following
Mahdi Karabiben @MahdiKarabiben
482 Followers 2K Following Product & data @Siffletdata. Ex-Zendesk. I love hearing what the data has to say. Views are my own. he/him.
Amir Marmul @amirmarmul
130 Followers 322 Following
Mr.T @PilosarInc
35 Followers 647 Following Web developeršØš®š«š· | Mobile [email protected]
yespon @yespon_liu
7 Followers 443 Following
Vicky Singh @iVkeySingh
5 Followers 530 Following
AB De @dab4fun
0 Followers 99 Following
Yanming @0xYanming
46 Followers 1K Following
k.s.balaram59 @KSBalaram59
9 Followers 402 Following
Materials @unreal9712
41 Followers 477 Following
Qwerty Azerty @qwertyazerty_17
0 Followers 82 Following
Morris @morristai01
53 Followers 316 Following
FrameIsEverything @_frameframe_
104 Followers 2K Following
Abhay @bothra90
581 Followers 2K Following I am interested in Stream Processing, Distributed Systems and Databases. Software Engineer by profession. Co-founded @FennelAI. Currently @Databricks
Daft @daftengine
688 Followers 68 Following Distributed query engine providing simple and reliable data processing for any modality and scale (https://t.co/IN219tFqrN)
Johnson Lee @Johnson40235768
6 Followers 302 Following
Dev Sharma @devshrm66
177 Followers 758 Following
ButterBright @_butterbright_
5 Followers 70 Following Database developer, Apache SkyWalking committer. Move fast and break things.
Liugddx @liugddx
66 Followers 2K Following I am a Big Data Platform Development Engineer. Focusing on distributed computing system. Focusing on Data integration.
č¾ę£® Essen @essen_ai
52K Followers 815 Following äøč¦åę¢ę¢ē“¢éåäøēļ¼ēŗå¤č®ēęŖä¾å儽ęŗå | ęęč²/åŖä½éå¢åęEdTechē¬č§å ½å ¬åøåé«ē®”ćåę”å¾ę„/é²ēļ¼ęÆēµčŗ«åøēæč ļ¼ē§ęę儽č /AIéčØę“¾ļ¼ęÆē¶ęæäøēę°čŖē±äø»ē¾©č åęæę²»äøēę°äæå®äø»ē¾©č ļ¼ä¹ęÆęÆå¤čåøę“¾ēčæ½éØč https://t.co/sA6tulO8mi
Cloudera @cloudera
108K Followers 3K Following Cloudera is the only data and AI platform company that brings AI to data anywhere: in clouds, data centers, and at the edge.
Community Over Code @ApacheCon
13K Followers 2K Following The events of the projects of The ASF. https://t.co/DwEDKpK1Ko #CommunityOverCode
Andy Walner @andywalner
272 Followers 1K Following Sharing thoughts on the data and AI landscape. Product @OnehouseHQ . Prev @Google, @DashworksAI, @UMich
StreamNative @streamnativeio
2K Followers 35 Following StreamNative was founded by the original creators of Apache Pulsar and offers a fully managed Pulsar solution.
LanceDB @lancedb
3K Followers 52 Following Developer-friendly, open source AI-Native Multimodal Lakehouse https://t.co/wXn4tw66HV
FastAPI @FastAPI
39K Followers 1 Following FastAPI framework, high performance, easy to learn, fast to code, ready for production. š Web APIs with Python type hints. š By @tiangolo š¤
Qdrant @qdrant_engine
12K Followers 110 Following High-performance Rust-based vector search engine. https://t.co/362gvLXHcw
Open Lakehouse Commun... @open_lakehouse
1K Followers 20 Following A community page to share all things Open Lakehouse ft. Apache Hudi, Iceberg & Delta Lake.
Andrew Lamb @andrewlamb1111
3K Followers 74 Following Apache {DataFusion, Arrow} PMC, Database Engineer
Sammy Sidhu @Sammy_Sidhu
405 Followers 202 Following CEO at https://t.co/8trqylaLqu Building @daftengine: Distributed query engine providing simple and reliable data processing for any modality and scale
Jay Chia - daft.ai @JayChia5
374 Followers 110 Following Cofounder @ Eventual. Works on Daft (https://t.co/f2BxW6m2uo) the data engine for AI #RunModelsOnData
Rocky @Rocky_Bitcoin
125K Followers 3K Following Long term investor #BTC #TAO #SOL #SUI #XRP| MeMe Professional Data Player | Crypto since 2017 | Not financial advice, DYORš
Daft @daftengine
688 Followers 68 Following Distributed query engine providing simple and reliable data processing for any modality and scale (https://t.co/IN219tFqrN)
ray @raydistributed
10K Followers 3 Following The AI framework trusted by OpenAI, Uber, and Airbnb. Created and developed by @anyscalecompute.
velox-lib @velox_lib
28 Followers 9 Following
CelerData @CelerData
312 Followers 453 Following CelerData is the only SQL engine that is fast enough to run the most demanding workloads directly on your data lakehouse. All powered by @StarRocksLabs.
StarRocks @StarRocksLabs
994 Followers 169 Following Visit StarRocks Website: https://t.co/nUo4tk00Yk š¬ Slack: https://t.co/huvhB2yaEA
Delta Lake @DeltaLakeOSS
10K Followers 67 Following Delta Lake is an open-source storage framework that enables building a Lakehouse architecture for Spark, Flink, Trino, Hive, Scala, Java, Rust, Python, & more!
DuckDB @duckdb
21K Followers 36 Following DuckDB is an analytical in-process SQL database management system. "DuckDB" and the DuckDB logo are registered trademarks of the DuckDB Foundation.
MotherDuck @motherduck
8K Followers 161 Following Data warehouse for customer-facing and internal analytics, built in collab with @duckdblabs. Join us at Small Data SF, Nov 4th & 5th. https://t.co/TSxedEMVEm
Leonard Xu @Leonardxbj
2K Followers 751 Following Flink PMC Member & Flink CDC Lead, Flink Connector TL @alibaba_cloud, focus on Streaming SQL & Data Integration
fxx @xiangfu0
332 Followers 412 Following Author of @ApachePinot. Data Analytics Infrastructure. https://t.co/YTJr7uttTN
Ratatui @ratatui_rs
3K Followers 8 Following A Rust library that's all about cooking up terminal user interfaces (TUIs) Account run by a rat https://t.co/qGgUTQpWtb
O'Reilly Media @OReillyMedia
105K Followers 20K Following Gain technology and business knowledge and hone skills with learning resources created and curated by O'Reilly experts
äøå²é misaki masa... @sxyazi
3K Followers 217 Following Dreamer, Engineer, Blogger. Opinions are my own. English Alt: @sxyazi_
ApacheDataFusion @ApacheDataFusio
766 Followers 1 Following
Apache Doris @doris_apache
2K Followers 2K Following Fastest Analytics & Search Database in the AI Era Star us on Github: https://t.co/8SplJcHxKH Slack: https://t.co/qOIgHkaZc0
Rust Language @rustlang
153K Followers 2 Following A programming language empowering everyone to build reliable and efficient software. ** This account is no longer active. Follow us on other platforms! **
Rust Trending @RustTrending
36K Followers 1 Following Automated bot tweeting trending Rust repositories on GitHub. Not an official @github or @rustlang product. Made by @pbzweihander_rs, but not curated by.
Apache - The ASF @TheASF
66K Followers 210 Following The global home for open source software, powering some of the worldās most ubiquitous software projects in web, big data, Java, IoT, cloud computing, and more.
Jim Dowling @jim_dowling
3K Followers 1K Following Co-founder and CEO @hopsworks. Organizer of the feature store summit. I am writing a book on Building ML Systems for O'Reilly.
Xuanwo @OnlyXuanwo
11K Followers 940 Following ASF Member. @ApacheOpenDAL PMC Chair. VISION: Data Freedom. Working on #RBIR with @LanceDB
Ray @rayyy1024
1K Followers 908 Following Rusty Lakehouse. Coauthor of iceberg-rust. Apache Iceberg Committer.
Anyscale @anyscalecompute
12K Followers 3 Following Modernize your AI capabilities. The best way to run @raydistributed, the AI framework trusted by OpenAI, Uber, and AirBnb.
Vinish Reddy @VinishReddy97
9 Followers 70 Following Software Engineer @Onehousehq Engineer with passion to solve challenging problems.
Dipankar Mazumdar @Dipankartnt
2K Followers 562 Following Director (Data/AI) @Cloudera, Prev. Staff Eng Advocate @OnehouseHQ, DevRel @Dremio, Engineering @Qlik, @OtisElevatorCo | Author | Book: https://t.co/4xSkn6zskp
Apache XTable (Incuba... @apachextable
368 Followers 21 Following Apache XTable is a cross-table interop of table formats Apache Hudi, Apache Iceberg, and Delta Lake. (prev OneTable) https://t.co/SXsyRuZMND
Yongkyun Lee @yongkyun_lee
41 Followers 80 Following Currently @Onehousehq | Previously @GoldmanSachs, @Caltech, @ClovaAiLab
Jark Wu @jarkwu
1K Followers 185 Following Staff Software Engineer at Alibaba Cloud, Apache Flink PMC, Apache Fluss (Incubating) PPMC