DuckLake SDK is a software development kit that lets programmers work with data lakes in Python and Rust. A data lake is a way to store large amounts of data in organized, queryable files. This SDK provides tools to create tables, write data into them, read data back out, and even travel back in time to see historical versions of your data. It works with common data tools like Polars and DuckDB, and can store data either locally on your computer or in cloud storage like AWS S3. The project is designed to be a lightweight alternative to DuckDB's official data lake extension, giving developers more flexibility in how they build data applications.
How It Works
You have lots of data files scattered around and want a smarter way to organize, query, and version them like a professional data engineer.
You set up a DuckLake instance with a simple SQLite database to track your tables and a folder to store your data files.
You create tables with specific columns and data types, just like setting up a well-organized spreadsheet with rules.
Using Python tools you already know like Polars, you easily write your data into the lake where it gets stored efficiently as Parquet files.
Read the latest version of your tables and get immediate results.
Look back at what your data looked like at any point in time, like having a time machine for your tables.
Everything is neatly cataloged, versioned, and ready to use. You can share your data lake with teammates and build powerful data workflows.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.