pg_mooncake + Neon: serverless analytics in Postgres
pg_mooncake launched in beta two months ago. It adds columnar storage and a vectorized engine to Postgres.
Over the last two months, with our customers, we've realized we accidentally built the best serverless database for analytics. Here's the story:
1. pg_mooncake 🤝 Neon storage engine
Neon innovated significantly on its storage engine.
There's a great thread by Nikita on this. This storage engine is how Neon can achieve true separation of storage & compute. This leads to an exceptional DX around branching, scale-to-zero, read replicas, instant provisioning... All of these are goodies of a true serverless Postgres.
pg_mooncake's architecture also involves true separation of storage and compute. Data is stored in object store (local file system or S3), and only metadata is stored in Postgres.
This is the Lakehouse architecture. The Lakehouse is how data in some of the largest companies in the world is stored and analyzed in a cost efficient and serverless manner.
TLDR on the Lakehouse: store all data in object store in a universal format. Query this data directly with as a pandas dataframe, with Spark, or with DuckDB in a serverless way.
pg_mooncake's goal is to bring this flexibility, scalability, and 'serverless' analytics to every Postgres application.
2. Scale everything to zero.
Combining Neon's storage, and our analytics architecture –– you get complete separation of storage and compute for transactional and analytical workloads. While keeping all table metadata in Postgres.
With this serverless architecture, and combining analytical and transactional tables in one system, we've seen customers do cool things like:
- Separate and scale analytical workloads separately from transactional workloads.
- Sync changes from heap tables into columnstore tables using a separate compute.
- Join columnstore tables with heap tables for BI and dashboards.
- And finally, run entire analytics services that scale to zero.
Customers like Inkeep, ship their application, fast BI dashboards, and their analytics services entirely in Postgres. Each other of their services scale independently, while all data and table metadata stays in Postgres. It's pretty neat.
3. 'scale to zero' analytics with Python.
This is where it gets fun.
pg_mooncake writes data in the standard analytics formats Iceberg and Delta to your object store. These as metadata layers over parquet files that make data files 'feel' like a table to any query engine (Pandas, Polars, Spark, DuckDB).
A common workload we've seen is pg_mooncake customers querying and transforming columnstore tables using Python. It's very simple to create a dataframe from your columnstore table (docs for more info).
We've also seen requests for running semantic query engines like LOTUS on Postgres.
A blog on Postgres + LOTUS coming soon :)
3. Get started today!
pg_mooncake is available on Neon today. Read our docs for creating your first columnstore table in Neon.
hint: it's easy :)
pg_mooncake comes out of beta Jan 15th 2025. Stay tuned.