--- title: Joins description: Optimize JOIN queries in ParadeDB canonical: https://docs.paradedb.com/documentation/performance-tuning/joins --- ParadeDB supports all PostgreSQL JOIN types and extends them with BM25-powered full-text search. This guide explains how JOINs behave with search, how to identify sub-optimal query plans, and offers strategies to keep queries fast. ## Supported JOIN Types ParadeDB supports all PostgreSQL JOINs: - `INNER JOIN` - `LEFT / RIGHT / FULL OUTER JOIN` - `CROSS JOIN` - `LATERAL` - Semi and Anti JOINs For the most part you can mix search and relational queries without changing your SQL. ## Scoring in JOINs When using `paradedb.score()` or `paradedb.snippet()` inside JOINs: - Scores and snippets are computed **before the JOIN** at the base table level. - JOIN conditions never change the score, they only determine which rows are combined. This design keeps scores predictable and consistent across queries. ## Performance Characteristics ### Fast Cases Queries are efficient when search filters can be applied directly to the underlying tables. In these cases, PostgreSQL can push down the `|||` operators so that each table does its own filtered index scan before the JOIN runs. That means: - Each table only contributes rows that already match the search condition. - The JOIN operates on much smaller intermediate sets. In this query, both `a.bio` and `b.content` are filtered independently. The planner runs efficient index scans on each table and then joins the results. ```sql SELECT a.name, b.title, paradedb.score(a.id) FROM authors a JOIN books b ON a.id = b.author_id WHERE a.bio ||| 'science fiction' AND b.content ||| 'space travel'; ``` The plan will have this shape: ``` Gather -> Parallel Hash Join Hash Cond: (b.id = a.id) -> Parallel ParadeDB Scan on authors a -> Parallel Hash -> Parallel ParadeDB Scan on books b ``` ### Slower Cases Queries become slower when search conditions span multiple tables in a way that prevents PostgreSQL from pushing them down. The most common example is an `OR` across different tables: ```sql SELECT a.name, b.title FROM authors a JOIN books b ON a.id = b.author_id WHERE a.bio ||| 'science' OR b.content ||| 'artificial'; ``` Because the condition references both `a` and `b`, PostgreSQL cannot apply it until after the join. As a result, both tables must be scanned in full, joined, and only then filtered. The plan will have this shape: ``` Gather -> Parallel Hash Join Hash Cond: (a.id = b.author_id) Join Filter: (a.bio ||| (...) OR b.content ||| (...)) -> Parallel Seq Scan on authors a -> Parallel Hash -> Parallel Seq Scan on books b ``` Note that the `|||` query is in the _Join Filter_, not in the scan. ## Diagnosing Performance Use `EXPLAIN` to check the query plan: ```sql EXPLAIN (ANALYZE, BUFFERS) SELECT a.name, b.title, paradedb.score(a.id) FROM authors a JOIN books b ON a.id = b.author_id WHERE a.bio ||| 'science' OR b.content ||| 'artificial'; ``` Watch for: - `Custom Scan` nodes with large row counts - ParadeDB operators inside JOIN conditions - `Tantivy Query: all` (full index scan) ## Writing Faster JOIN Queries ### Replace Cross-Table OR with UNION If you don’t need scores/snippets and have a simple JOIN, express the OR as a UNION of two separately filtered joins. This lets PostgreSQL push each search predicate down to a Custom Index Scan and avoid a join-time filter. ```sql SELECT a.name, b.title FROM authors a JOIN books b ON a.id = b.author_id WHERE a.bio ||| 'science' UNION SELECT a.name, b.title FROM authors a JOIN books b ON a.id = b.author_id WHERE b.content ||| 'artificial'; ``` ### Use CTEs for Complex Queries Use common table expressions (CTEs) to pre-filter each table with its own search condition, then join the smaller result sets together. If possible, add a `LIMIT` to each CTE to keep the result sets small. ```sql WITH matching_authors AS ( SELECT id, name, paradedb.score(id) AS author_score FROM authors WHERE bio ||| 'science' LIMIT 100 ), matching_books AS ( SELECT id, title, author_id, paradedb.score(id) AS book_score FROM books WHERE content ||| 'artificial' LIMIT 100 ) SELECT COALESCE(ma.name, a.name) AS name, COALESCE(mb.title, b.title) AS title, ma.author_score, mb.book_score FROM matching_authors ma FULL JOIN matching_books mb ON ma.id = mb.author_id LEFT JOIN authors a ON mb.author_id = a.id AND ma.id IS NULL LEFT JOIN books b ON ma.id = b.author_id AND mb.id IS NULL; ``` BM25 scores should not be added, if you want to combine scores then consider using [reciprocal rank fusion (RRF)](https://www.paradedb.com/learn/search-concepts/reciprocal-rank-fusion). ## Roadmap We really want to remove the need to think about the way to do `JOIN`s in ParadeDB. At the moment we are actively working on: - A `CustomScan Join API` for native join handling - Smarter cost estimation for the PostgreSQL planner