--- title: Logical Replication Operational Guide description: Monitor, troubleshoot, and safely operate ParadeDB as a permanent logical replica canonical: https://docs.paradedb.com/deploy/logical-replication/operational-guide --- This guide covers how to operate ParadeDB after logical replication has been set up. Use [Getting Started with Logical Replication](/deploy/logical-replication/getting-started) to create the publication and subscription first. This page focuses on what happens after the link is established and ParadeDB is staying in sync continuously. ## Operating Model When ParadeDB is used as a logical subscriber: 1. Your application writes to tables on the publisher 2. PostgreSQL logical replication applies those row changes to matching tables on ParadeDB 3. ParadeDB maintains BM25 indexes locally on the subscriber 4. Search and analytics queries run against ParadeDB instead of the primary This keeps the source database authoritative while isolating search traffic from OLTP traffic. Logical replication copies row changes into ParadeDB, but it does not copy BM25 indexes from the publisher. For the deployment described in this guide, build the BM25 indexes you plan to query on the ParadeDB subscriber. ## Baseline Workflow ### 1. Wait for the Initial Copy to Finish Let PostgreSQL finish copying the base table data before you build BM25 indexes. This avoids extra indexing work during the bootstrap phase. On ParadeDB, you can check whether the initial copy is still running with: ```sql SELECT subname, worker_type, CASE WHEN relid = 0 THEN NULL ELSE relid::regclass END AS table_name, latest_end_time FROM pg_stat_subscription ORDER BY 1, 2, 3; ``` The initial copy is complete when there are no remaining rows with `worker_type = 'table synchronization'`. If you want a stricter per-table check, run: ```sql SELECT srrelid::regclass AS table_name, srsubstate FROM pg_subscription_rel ORDER BY 1; ``` The initial copy is complete when every replicated table is in state `r` (`ready`). ### 2. Build BM25 Indexes on ParadeDB Once the replicated tables are caught up, create BM25 indexes locally on ParadeDB: ```sql CREATE INDEX mock_items_bm25_idx ON public.mock_items USING bm25 (id, description, category, rating) WITH (key_field='id'); ``` After this, ongoing replicated `INSERT`, `UPDATE`, and `DELETE` operations will keep the BM25 index current automatically. ### 3. Query ParadeDB Your application can now issue search queries to ParadeDB without adding search indexes to the primary database: ```sql SELECT id, description, pdb.score(id) AS score FROM mock_items WHERE description @@@ 'running shoes' ORDER BY score DESC LIMIT 10; ``` ## Day-2 Operations ### Choose Publication and Subscription Boundaries For large or high-churn production tables, use one publication and one subscription per large table, or group only small related tables together. This gives each subscription its own main apply worker and replication slot. In normal steady-state replication, PostgreSQL does not parallelize ordinary change application across tables within a single subscription, so one hot table can delay other tables that share that apply worker. A publication per table alone does not provide that isolation unless it also has its own subscription. If you split replication this way, size the replication worker settings for the number of subscriptions you plan to run: - On the publisher, set `max_replication_slots` to at least the number of subscriptions plus reserve for initial table synchronization workers. During bootstrap, each active table synchronization worker can temporarily consume its own replication slot on the publisher. With the default `max_sync_workers_per_subscription = 2`, leave room for the main subscription plus up to two extra sync slots per bootstrapping subscription, and set `max_wal_senders` high enough to cover the same plus any physical replicas. - On the subscriber, set `max_replication_slots` and `max_logical_replication_workers` to at least the number of subscriptions plus reserve for table synchronization workers. On PostgreSQL 18+, `max_active_replication_origins` controls replication origin tracking separately and should also be sized accordingly. `max_worker_processes` must be high enough to accommodate those logical replication workers and any other background workers used by the system. - `max_sync_workers_per_subscription` controls initial-copy parallelism when a subscription is created or refreshed. The default is `2`, so multi-table publications normally copy at most two tables at a time unless you raise it. ### Add New Tables When you want ParadeDB to index a new table: 1. Apply the new table DDL on the publisher 2. Apply the same DDL on ParadeDB 3. Make sure the publication includes the table 4. Refresh the subscription 5. Build a BM25 index on ParadeDB if the table should be searchable Whether step 3 is manual depends on how the publication was defined. If the publication uses `FOR ALL TABLES`, the new table is included automatically. If it uses `FOR TABLES IN SCHEMA ...`, new tables in those schemas are included automatically. If it was created from an explicit table list, add the table manually. If you do not want the table on ParadeDB, do not include it in the publication. ```sql -- On the publisher ALTER PUBLICATION app_search_pub ADD TABLE public.new_table; -- On ParadeDB ALTER SUBSCRIPTION app_search_sub REFRESH PUBLICATION; ``` ### Change Indexed Columns If you add or remove a column that is part of a BM25 index: 1. Apply the table change on both the publisher and ParadeDB 2. Let replication catch up again 3. Rebuild the BM25 index on ParadeDB See [Reindexing](/documentation/indexing/reindexing) for the BM25 rebuild workflow. ### Roll Out DDL Safely PostgreSQL logical replication does not replicate schema changes. That means the publisher and ParadeDB must be kept in sync manually. In practice, most teams do this through their existing migration runner or framework tooling, whether that is Rails migrations, Django migrations, Prisma Migrate, or another migration system. For additive changes such as `ADD COLUMN`, the safest rollout is usually: 1. Apply the additive DDL on ParadeDB first 2. Apply the same DDL on the publisher 3. Let replication continue normally 4. Rebuild any BM25 indexes whose indexed column list changed This follows PostgreSQL's recommendation to apply additive schema changes on the subscriber first whenever possible, which avoids intermittent apply failures. Logical replication can tolerate extra columns on the subscriber, so adding a column on ParadeDB first will not stop replication by itself. Those extra subscriber-only columns use their local default value, or `NULL` if no default is defined, until the publisher starts sending that column. If the new column must be `NOT NULL`, give it a compatible default on both sides or use a coordinated maintenance window. Otherwise replicated `INSERT` operations can fail before the publisher-side change is in place. If the change is not additive, such as a column rename, drop, or incompatible type change, use a short maintenance window, pause writes to the affected tables if possible, and coordinate both sides explicitly: ```sql -- On Subscriber ALTER SUBSCRIPTION marketplace_sub DISABLE; ALTER TABLE mock_items RENAME COLUMN category TO product_category; -- On Publisher ALTER TABLE mock_items RENAME COLUMN category TO product_category; -- Back on Subscriber ALTER SUBSCRIPTION marketplace_sub ENABLE; ``` Do not leave a disabled subscription in place longer than necessary. The logical slot on the publisher can continue retaining WAL while the subscriber is disabled. ### Handle Tables Without Primary Keys PostgreSQL needs a replica identity to replicate `UPDATE` and `DELETE` operations. A primary key is best. Another suitable unique index can also be used as the replica identity. If a table has no suitable key, you can use the per-table fallback: ```sql ALTER TABLE public.events REPLICA IDENTITY FULL; ``` Do not think of this as a server-wide setting. `REPLICA IDENTITY FULL` is set per published table and should be treated as a fallback rather than the default design. PostgreSQL explicitly warns that subscriber-side `UPDATE` and `DELETE` can become very inefficient under `FULL`, because the subscriber must locate the matching row using the entire old row image rather than a compact key. `FULL` also increases WAL volume and replication traffic on the publisher, since every `UPDATE` and `DELETE` writes the full before-image of the row into WAL instead of just the key columns. ### Monitor the Publisher Permanent logical replication is operationally safe only if you watch the publisher, not just the subscriber. The most important signal is how much WAL a logical slot is retaining. ```sql SELECT slot_name, active, restart_lsn, confirmed_flush_lsn, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, wal_status, safe_wal_size, inactive_since FROM pg_replication_slots WHERE slot_type = 'logical'; ``` Watch for: - `retained_wal` growing steadily because the subscriber is not acknowledging WAL quickly enough - `inactive_since` becoming non-`NULL` for longer than expected - `wal_status` showing that the slot is under pressure - Filesystem usage on the volume that contains `pg_wal` To reduce blast radius, configure `max_slot_wal_keep_size` on the publisher. This caps how much WAL a slot may retain, but it can also invalidate a lagging subscriber, so it should be paired with alerting and a reseed plan. ### Monitor the Subscriber Use the subscriber to confirm that apply workers are healthy and that errors are not accumulating: ```sql SELECT subname, worker_type, received_lsn, latest_end_lsn, latest_end_time FROM pg_stat_subscription; SELECT subname, apply_error_count, sync_error_count FROM pg_stat_subscription_stats; ``` If `latest_end_time` stops advancing or `apply_error_count` increases, inspect the subscriber logs immediately. ### Troubleshoot Apply Failures One common cause of apply-worker failures is schema drift between the publisher and subscriber. Two common log patterns for schema drift are: ```text logical replication target relation "public.doctor" is missing replicated columns: "personnel_id", "role_function_id" ``` ```text logical replication apply worker for subscription "paradedb_subscription" has started background worker "logical replication apply worker" (PID 2570238) exited with exit code 1 ``` The first message is the root cause. The second means the apply worker crashed after hitting that error and PostgreSQL will try to restart it. When you see these messages: 1. Inspect the subscriber logs for the first schema-mismatch error, not just the worker restart message 2. Compare the affected table definition on the publisher and ParadeDB 3. Apply the missing DDL on ParadeDB 4. Re-enable or refresh the subscription if needed 5. Rebuild any BM25 indexes affected by the schema change Another common cause of apply-worker failures is a logical replication conflict. For example, a duplicate key, a permissions failure on the target table, or row-level security on the subscriber can stop replication even when the schemas match. ```text ERROR: duplicate key value violates unique constraint ... CONTEXT: processing remote data during INSERT for replication target relation ... ``` When you suspect a replication conflict: 1. Inspect the subscriber logs for the first conflict error and note the finish LSN and replication origin if PostgreSQL logged them 2. Resolve the underlying issue on the subscriber, such as conflicting local data, missing privileges, or row-level security policy interference 3. Resume replication normally once the conflict is removed 4. Only if you intentionally want to discard that remote transaction, use `ALTER SUBSCRIPTION ... SKIP` with care Skipping a conflicting transaction can leave the subscriber inconsistent, so it should be treated as a last resort rather than the default fix. For conflict types and the PostgreSQL recovery workflow, see the [PostgreSQL logical replication conflicts documentation](https://www.postgresql.org/docs/current/logical-replication-conflicts.html). ### Emergency: WAL Keeps Accumulating on the Publisher If the logical slot on the publisher is filling disk and ParadeDB cannot catch up quickly enough, the priority is protecting the publisher. 1. First, fix the subscriber if the issue is simple and recent, such as a schema mismatch or networking issue 2. If the publisher is running out of disk and the subscriber can be rebuilt, remove the subscription or drop the logical slot so the publisher can recycle WAL again 3. Recreate the subscription and reseed ParadeDB once the publisher is safe Disabling the subscription is not an emergency fix for WAL buildup. A disabled subscription still leaves the logical slot behind on the publisher, and that slot can continue retaining WAL. If the subscriber is reachable and healthy enough to cleanly tear down, dropping the subscription is the cleanest path: ```sql DROP SUBSCRIPTION paradedb_subscription; ``` To protect the publisher from continued `pg_wal` growth when you are intentionally giving up the current replica state, drop the slot on the publisher: ```sql SELECT pg_drop_replication_slot('paradedb_subscription'); ``` After either step, ParadeDB must be reinitialized from a fresh schema and data copy before it can resume as a logical subscriber. ## Common Pitfalls - Starting with pre-populated subscriber tables while using `copy_data = true` - Applying DDL on only one side of the replication link - Forgetting that new tables must be added to the publication and refreshed on the subscription - Writing directly to subscribed tables on ParadeDB, which can create conflicts with incoming replicated changes - Leaving a broken logical slot unattended on the publisher until `pg_wal` fills disk - Assuming `ALTER SUBSCRIPTION ... DISABLE` relieves publisher-side WAL pressure For schema-change basics, see [Schema Changes](/deploy/logical-replication/configuration). For multiple source databases, see [Multi-Database Replication for Microservices](/deploy/logical-replication/multi-database).