New PostgreSQL Ecosystem Player: ParadeDB

Table of Contents

Original WeChat Article Link

New PostgreSQL Ecosystem Player: ParadeDB
#

YC S23 invested in a new project called ParadeDB, which is extremely interesting. Their slogan is “Postgres for Search & Analytics — Modern Elasticsearch Alternative built on Postgres”. It’s PostgreSQL for search and analytics, aiming to be an Elasticsearch alternative.

PostgreSQL’s ecosystem is indeed becoming increasingly prosperous. Among PG-based extensions and derivatives, we already have:

MongoDB open-source alternative based on PG — FerretDB
SQL Server open-source alternative — Babelfish
Firebase open-source alternative — Supabase
AirTable open-source alternative — NocoDB
And now an ElasticSearch open-source alternative — ParadeDB

ParadeDB actually consists of three PostgreSQL extensions: pg_bm25, pg_analytics, and pg_sparse. All three extensions can be used independently. I’ve already packaged these extensions (v0.5.6) and will include them by default in Pigsty’s next release, allowing users to use them out of the box.

I’ve translated ParadeDB’s official website introduction and four blog articles to introduce this new star in the PostgreSQL ecosystem. Today’s article is the first one — an overview.

ParadeDB
#

We’re proud to introduce ParadeDB: a PostgreSQL database optimized for search scenarios. ParadeDB is the first Postgres database built to be an Elasticsearch alternative, designed for lightning-fast full-text, semantic, and hybrid search on PostgreSQL tables.

What Problems Does ParadeDB Solve?
#

For many organizations, search remains an unsolved problem — despite the existence of giants like Elasticsearch, most developers who’ve worked with it know how painful it is to run, tune, and manage Elasticsearch. While there are other search engine services available, integrating these external services with existing databases introduces complex challenges and costs related to index rebuilding and data replication.

Developers seeking unified authoritative data sources and search engines have turned to Postgres. PG already provides basic full-text search capabilities through tsvector and vector semantic search capabilities through pgvector. These tools work well for simple use cases and medium-sized datasets, but fall short when tables grow large or queries become complex:

Sorting and keyword search on large tables is very slow
No BM25 scoring support
No hybrid search combining vector search with full-text search techniques
No real-time search — data must be manually re-indexed or re-embedded
Limited support for complex queries like faceting or relevance tuning

So far, we’ve witnessed many engineering teams reluctantly layer Elasticsearch on top of Postgres, only to eventually abandon it because it’s too bloated, expensive, or complex. We wondered: what if Postgres itself had ElasticSearch-level search capabilities? Then developers wouldn’t face this dilemma — use PostgreSQL alone with limited search capabilities, or maintain two separate services for data source and search engine?

Who Is ParadeDB For?
#

Elasticsearch has broad application scenarios, but we don’t attempt to cover all scenarios at once — at least not in the current stage. We prefer to focus on core scenarios — specifically serving users who want to search on PostgreSQL. ParadeDB is ideal for you if:

You want to use a single Postgres as your source of truth, avoiding the hassle of copying data between multiple services
You want full-text search on massive documents stored in Postgres without compromising performance and scalability
You want to combine ANN/similarity search with full-text search for more precise semantic matching

ParadeDB Product Introduction
#

ParadeDB is a fully managed Postgres database with indexing and search capabilities for Postgres tables not found in any other Postgres provider:

Feature	Description
BM25 Full-Text Search	Full-text search supporting boolean, fuzzy, boost, and keyword queries. Search results are scored using the BM25 algorithm.
Faceted Search	Postgres columns can be defined as facets for easy bucketing and metric collection.
Hybrid Search	Search results can be scored considering both semantic relevance (vector search) and full-text relevance (BM25).
Distributed Search	Tables can be sharded for parallel query acceleration.
Generative Search	Postgres columns can be fed into large language models (LLMs) for automatic summarization, classification, or text generation.
Real-time Search	Text indexes and vector columns are automatically kept in sync with underlying data.

Unlike managed services like AWS RDS, ParadeDB is a PostgreSQL extension plugin that requires no setup, integrates with the entire PG ecosystem, and is fully customizable. ParadeDB is open source (AGPLv3) and provides a simple Docker Compose template for developers who need self-built/customized solutions.

How ParadeDB Is Built
#

At its core, ParadeDB is a standard Postgres database with custom extensions written in Rust that introduce enhanced search capabilities.

ParadeDB’s search engine is built on Tantivy, an open-source Rust search library inspired by Apache Lucene. Its indexes are stored natively in PG as native PG indexes, avoiding cumbersome data replication/ETL work while ensuring transactional ACID properties.

ParadeDB provides a new extension for the Postgres ecosystem: pg_bm25. pg_bm25 implements Rust-based full-text search in Postgres using the BM25 scoring algorithm. ParadeDB comes pre-installed with this extension plugin.

What’s Next?
#

ParadeDB’s managed cloud version is currently in Private Beta. Our goal is to launch a self-service cloud platform in early 2024. If you want to access the Private Beta version in the meantime, please join our waiting list.

Our core team’s focus is developing ParadeDB’s open-source version, which will be released in winter 2023.

We build in public and are excited to share ParadeDB with the entire community. Please follow us — in future blog posts, we’ll dive deeper into the interesting technical challenges behind ParadeDB.

FerretDB: PostgreSQL Disguised as MongoDB

2023-10-08·1242 words·6 mins

PostgreSQL PG-Ecosystem MongoDB Extension

FerretDB aims to provide a truly open-source MongoDB alternative based on PostgreSQL.

PostgreSQL Wins 2024 Database of the Year Award! (Fifth Time)

2024-01-05·724 words·4 mins

PostgreSQL PG-Ecosystem

DB-Engines officially announced today that PostgreSQL has once again been crowned “Database of the Year.” This is the fifth time PG has received this honor in the past seven years. If not for Snowflake stealing the spotlight for two years, the database world would have almost become a PostgreSQL solo show.

PostgreSQL, The most successful database

2023-06-28·3065 words·7 mins

PostgreSQL PG-Ecosystem

StackOverflow 2023 Survey shows PostgreSQL is the most popular, loved, and wanted database, solidifying its status as the ‘Linux of Database’.

AI Large Models and Vector Database PGVector

2023-05-10·1941 words·10 mins

PostgreSQL PG-Development Extension Vector

This article focuses on vector databases hyped by AI, introduces the basic principles of AI embeddings and vector storage/retrieval, and demonstrates the functionality, performance, acquisition, and application of the vector database extension PGVECTOR through a concrete knowledge base retrieval case study.

How Powerful is PostgreSQL Really?

2022-08-22·2317 words·11 mins

PostgreSQL PG-Ecosystem Performance

Let performance data speak: Why PostgreSQL is the world’s most advanced open-source relational database, aka the world’s most successful database. MySQL vs PostgreSQL performance showdown and distributed database reality check.

Why PostgreSQL is the Most Successful Database?

2022-07-12·3101 words·15 mins

PostgreSQL PG-Ecosystem

Database users are developers, but what about developers’ preferences, likes, and choices? Looking at StackOverflow survey results over the past six years, it’s clear that in 2022, PostgreSQL has won all three categories, becoming literally the “most successful database”

New PostgreSQL Ecosystem Player: ParadeDB#

ParadeDB#

What Problems Does ParadeDB Solve?#

Who Is ParadeDB For?#

ParadeDB Product Introduction#

How ParadeDB Is Built#

What’s Next?#

Related

New PostgreSQL Ecosystem Player: ParadeDB
#

ParadeDB
#

What Problems Does ParadeDB Solve?
#

Who Is ParadeDB For?
#

ParadeDB Product Introduction
#

How ParadeDB Is Built
#

What’s Next?
#