--- title: Architecture --- ParadeDB introduces modern query execution paths and data structures, optimized for high-ingest search and analytics workloads, to Postgres. ## Custom Index  In Postgres, indexes provide alternative data structures for accessing the data in a table (which Postgres calls a "heap table") more efficiently. ParadeDB introduces a custom index called the _BM25 index_. When a table row is inserted or updated, the BM25 index is immediately notified. These changes are recorded as part of the current transaction, ensuring that index updates are real-time. ## Data Model  The BM25 index is laid out as an [LSM tree](#lsm-tree), where each segment in the tree consists of both an inverted index and columnar index. The inverted and columnar indexes optimize for fast reads, while the LSM tree optimizes for high-frequency writes. ### Inverted Index An inverted index is a structure that maps each term (i.e., tokenized word) to a list of documents that contain that term (called a "postings list") along with metadata like term frequency and document frequency. [This structure](https://github.com/quickwit-oss/tantivy/blob/main/ARCHITECTURE.md#the-inverted-search-index) allows ParadeDB to efficiently retrieve all documents matching a particular search term or phrase without scanning the entire table. ### Columnar Index Alongside the inverted index, ParadeDB also maintains a structure that stores fields in a column-oriented format. Columnar formats are standard for analytical (i.e. OLAP) databases because they store values contiguously and enable efficient scans over large datasets compared to Postgres' row-oriented layout. In ParadeDB these structures are referred to as [fast fields](/documentation/indexing/fast_fields). ### LSM Tree To support real-time updates, the BM25 index uses a [Log-Structured Merge (LSM) tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree). An LSM tree is a write-optimized data structure commonly used in systems like RocksDB and Cassandra. The core idea behind an LSM tree is to turn random writes into sequential ones. Incoming writes are first stored in an in-memory buffer, which is fast to update. Once the buffer fills up or the current statement finishes, it is flushed to disk as an immutable "segment" file. These segment files are organized by size into layers or levels. Newer data is written to the topmost layer. Over time, data is gradually pushed down into lower levels through a process called merging or compaction, where data from smaller segments is merged, deduplicated, and rewritten into larger segments. In ParadeDB, every `INSERT`/`UPDATE`/`COPY` statement creates a new segment. Each segment has its own inverted index and columnar index, which means that the BM25 index is actually a collection of many inverted/columnar indexes, each of which allows for very dense intersection queries to rapidly filter matches. ## Query Execution ### Custom Operators ParadeDB introduces a set of new text search operators to Postgres.
Operator | Type | Description |
---|---|---|
===
|
Term | Finds documents that contain an exact token. |
###
|
Phrase | Find documents that match a phrase. |
&&&
|
Match Conjunction | Find documents that contain all tokens in a query. |
|||
|
Match Disjunction | Find documents that contain any one of the tokens in a query. |
@@@
|
Advanced | Find all documents using an advanced{" "} query builder function. |