apache

apache / paimon-cpp

Public

Apache Paimon C++

10
4
100% credibility
Found May 22, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
CMake
AI Summary

Apache Paimon C++ is a programming library that lets developers work with Apache Paimon lakehouse data directly in C++ applications. It provides tools to read, write, and analyze data stored in the Paimon format, supporting common file types like Parquet, ORC, and Avro. The library is designed for speed and works without Java, making it easy to integrate into existing C++ projects. It also includes modern features for AI applications like vector search and full-text search capabilities.

How It Works

1
🔍 Discovering a faster way to work with data

You hear about Apache Paimon and realize you can use its powerful lakehouse format directly in your C++ projects without any Java setup.

2
📚 Learning what you can do

You explore the features and discover you can read, write, and analyze data in many formats like Parquet, ORC, and Avro all from your C++ code.

3
🔗 Connecting to your existing tools

The library works seamlessly with Arrow, so you can easily move data between your C++ applications and popular data tools.

4
🛠️ Building your first project

You download the code, run the build, and everything compiles smoothly into a ready-to-use library for your application.

5
Choosing your path
✍️
Writing data

You add new records to your data lake with automatic organization and compaction

📖
Reading data

You query your data lake for analysis, supporting both batch and streaming reads

🤖
AI features

You enable smart features like vector search to find similar items or full-text search across your data

🎉 Your application comes to life

Your C++ application now reads and writes data in the powerful Paimon lakehouse format, all running fast and without any Java dependencies.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is paimon-cpp?

Paimon-cpp is a native C++ implementation of Apache Paimon, bringing high-performance lakehouse access to C++ applications without requiring a JVM. It handles the full data lifecycle: reading and writing both append tables and primary key tables, with support for batch and streaming operations. The library integrates directly with Arrow for columnar in-memory access and speaks ORC, Parquet, and Avro formats natively. File system support covers local storage and Jindo out of the box.

Why is it gaining traction?

The data lakehouse space is dominated by Java-first tools like Apache Flink and Apache Iceberg, forcing C++ teams to either spin up JVM processes or miss out on modern table formats. Paimon-cpp flips this by offering native access to Paimon's LSM-based storage with deletion vectors and merge-on-read capabilities. The Arrow integration is particularly valuable for data science pipelines that need to move data between Python and C++ without serialization overhead. Its AI-oriented features like vector search and full-text indexing position it for embedding workloads alongside traditional analytics.

Who should use this?

C++ backend services that need to read/write lakehouse tables without JVM dependencies. Data engineering teams running C++ microservices on S3-compatible storage who want to avoid Kafka or Flink overhead. Teams already using Apache Paimon in Java/Flink environments who need a C++ consumer or producer. The Arrow integration makes it practical for ML pipelines that consume feature data in C++.

Verdict

At 10 stars with 1.0% credibility, this is extremely early-stage and actively migrating from Alibaba's fork to Apache infrastructure. The feature list is impressive, but documentation and production hardening are minimal. Watch this if you need native C++ lakehouse access; hold off if you need battle-tested tooling with mature S3 support and comprehensive testing. The Apache Paimon community is well-established, which provides credibility even if this specific C++ port is nascent.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.