pg_mooncake v0.2: what we're cooking

With v0.1, we proved that Postgres can deliver great analytics performance. Our approach combined two key pieces—DuckDB vectorized engine and a columnstore table—making pg_mooncake the only extension in Postgres that integrates both. The results speak for themselves: pg_mooncake ranks in ClickBench’s top 10, and we built the fastest Parquet-based engine along the way.

Now, with v0.2, we’re tackling the next big challenge, we call it out-of-the-box analytics.

First, why in-Postgres analytics?

Applications often have their first analytics needs far before they have their first data warehouse or analytics system.

We’ve seen this story over and over again. The typical path involves delaying the roadmap, setting up data replication into a separate system, hiring engineers to maintain the pipeline—all while stitching Postgres data with analytics data at the application level.

pg_mooncake eliminates this complexity. With columnstore tables, developers get a native Postgres experience for both transactional and analytical workloads. No separate systems, no complex pipelines—just Postgres.

In v0.2, we are focussed on making columnstore tables easier to get started for more workloads. In this blog, we cover the areas under active development.

1. Full Postgres Table Access Method (TAM) for Columnstore Tables (#80)

Today, all columnstore queries run through DuckDB. This delivered performance, but meant missing Postgres features like data-modifying CTEs (e.g., deleting rows, returning results, and inserting them into another table) and triggers.

In v0.2, we expose columnstore tables through a full Postgres table access method, allowing Postgres to read and write to them directly. DuckDB still serves as a fast execution engine, but queries seamlessly fall back to Postgres when needed.

At the planner stage, DuckDB will be the speedy execution engine, and we’ll push down computation as much as possible to it. Operators that can’t be executed with DuckDB will fall back to Postgres.

2. Logical Replication into columnstore tables. (#90)

Users often adopt pg_mooncake after encountering slow queries on their existing Postgres tables. With logical replication, we make it even easier to accelerate analytics without complex write patterns—simply replicate data from heap tables into columnstore tables.

Postgres' native logical replication receiver primarily executes high-level functions like table_insert and table_insert_multi, which we support through our full table access method (TAM). Initially, logical replication in pg_mooncake will support append-only workloads.

This significantly enhances pg_mooncake's deployability, no matter where your Postgres is hosted. You can run pg_mooncake alongside your existing Postgres deployments and seamlessly replicate data from heap tables into columnstore tables.

3. Small Inserts with a Rowstore Buffer (#89)

Logical replication introduces a challenge: how do we handle frequent small inserts without generating a new Parquet file and Delta Lake log for every transaction?

In v0.2, small inserts go into a rowstore table, while larger inserts write directly to Parquet files. When querying a columnstore table, Postgres scans both the rowstore and Parquet files. Once the rowstore exceeds a threshold, it flushes to Parquet. This approach also decouples Parquet writes from transactions, significantly reducing write amplification.

This is a major unlock, making columnstore tables viable as a primary store without requiring batched inserts—especially for logging and time-series workloads.

So, where does that put us?

pg_mooncake is the easiest way to get fast analytics inside Postgres. With v0.2, key use cases improve even further:

User-facing analytics with Postgres – Full ORM support, joins with regular Postgres tables, and only 1 system to manage.
Logs, chat history, time series – Optimized for append-only data with benefits of Parquet compression and low storage costs.
Complete Data lake out of the box – Postgres + pg_mooncake as both the catalog and engine to write data transactionally to Iceberg (S3 tables) and Delta Lake.

Also, Postgres + pg_mooncake is the fastest parquet-based execution with file skipping/segment elimination.

Get involved?

pg_mooncake v0.2 is actively in development, with strong support from our community—projects like Full Postgres TAM are entirely community-led. Currently slated for mid-April release.

Have feature requests or feedback? Join the discussion in our Slack and help shape the future of pg_mooncake.