> **Plain-language companion:** [v0.22.0.md](v0.22.0.md) ## v0.22.0 — Production Scalability & Downstream Integration **Status: ✅ Released.** Driven by [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) P1 items not addressed in v0.21.0 and the highest-value P2 items. > **Release Theme** > This release delivers the two highest-impact items from the overall > assessment deferred from v0.21.0: a minimal-viable in-database parallel > refresh worker pool (the single largest scalability unlock) and a downstream > CDC publication so stream table changes can drive Kafka, Debezium, and > event-sourcing pipelines without a second replication slot. Two P2 items > ship alongside: a predictive cost model for adaptive refresh and SLA-driven > tier auto-assignment. The transactional outbox helper moves to v0.24.0 > where it ships alongside a companion inbox helper as a complete > transactional messaging solution. ### Downstream CDC Publication (P1 — §9.2) > **In plain terms:** pg_trickle consumes CDC from source tables but cannot > *emit* changes downstream. This adds `stream_table_to_publication()` — a > helper that exposes every row applied to a stream table as a PostgreSQL > logical replication publication so Kafka Connect, Debezium, and > event-sourcing pipelines can subscribe with zero code and no second > replication slot. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | CDC-PUB-1 | **`stream_table_to_publication(name TEXT)` SQL function.** Creates a logical replication publication for the target stream table using `pgt_inserted_rows`/`pgt_deleted_rows` output from the MERGE step. Catalog column `downstream_publication_name` tracks the association. | 2–3d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.2 | | CDC-PUB-2 | **Lifecycle management.** `drop_stream_table_publication(name)`, auto-drop on `drop_stream_table()`, recreation on schema-change rebuild. | 1d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.2 | | CDC-PUB-3 | **`pg_stat_stream_tables` — `downstream_publication` column.** Surface publication name (or NULL) in the monitoring view. | 0.5d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.2 | | CDC-PUB-4 | **E2E tests.** Create publication; verify subscriber receives insert/update/delete events; drop and verify cleanup. | 1d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.2 | | CDC-PUB-5 | **Documentation.** `docs/SQL_REFERENCE.md` section on downstream publications; tutorial showing Kafka Connect integration pattern. | 1d | — | > **Downstream CDC publication subtotal: ~1–1.5 weeks** ### In-Database Parallel Refresh Worker Pool — Minimal Viable Slice (P1 — §3.1) > **In plain terms:** The scheduler today runs one refresh at a time per > tick. This installs a dynamic bgworker pool — a coordinator owns the DAG, > workers execute refreshes — so independent stream tables at the same DAG > level refresh simultaneously. Deployments with 200+ STs or long refresh > queues get immediate throughput gains. Opt-in via `max_parallel_workers`; > default 0 preserves existing serial behaviour. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | PAR-1 | **Coordinator / worker process split.** Coordinator BGW manages the tick; dispatches ready-to-run STs to a `PgLwLock`-protected shared work queue; worker BGWs pop entries and execute refresh transactions. | 1.5–2wk | [plans/sql/PLAN_PARALLELISM.md](plans/sql/PLAN_PARALLELISM.md) §3 | | PAR-2 | **`pg_trickle.max_parallel_workers` GUC** (default 0 = serial, range 0–32). Gate the entire parallel path so deployments can opt in incrementally. | 1d | [plans/sql/PLAN_PARALLELISM.md](plans/sql/PLAN_PARALLELISM.md) §4 | | PAR-3 | **DAG level extraction.** Re-use `topological_levels()` already in `dag.rs` to identify STs that can run concurrently (same level, no intra-level edges). | 0.5d | [plans/sql/PLAN_PARALLELISM.md](plans/sql/PLAN_PARALLELISM.md) §3 | | PAR-4 | **Worker crash recovery.** Coordinator marks the ST `ERROR` in `pgt_refresh_history` on worker crash (same behaviour as serial crash); respawns the worker slot. | 1d | [plans/sql/PLAN_PARALLELISM.md](plans/sql/PLAN_PARALLELISM.md) §5 | | PAR-5 | **E2E tests: correctness + throughput.** Diamond DAG with concurrent same-level refreshes; verify no partial-consistency window. Benchmark: wall-clock tick latency vs serial at 50-ST scale. | 1d | [plans/sql/PLAN_PARALLELISM.md](plans/sql/PLAN_PARALLELISM.md) §6 | > **Parallel refresh subtotal: ~3–4 weeks** ### Predictive Refresh Cost Model (P2 — §9.3) > **In plain terms:** The current adaptive threshold reacts *after* a slow > differential refresh. This extends self-monitoring to *predict* `duration_ms` > from `rows_inserted + rows_deleted` via linear regression over the last > hour. When the forecast exceeds `last_full_ms × 1.5`, pg_trickle switches > to FULL pre-emptively — eliminating the one-bad-cycle latency spike entirely. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | PRED-1 | **Linear regression forecaster.** Fit `duration_ms ~ delta_rows` over `pg_trickle.prediction_window` minutes of `pgt_refresh_history` per ST. Expose fitted slope and intercept as columns in `df_threshold_advice`. | 1–2d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.3 | | PRED-2 | **Pre-emptive FULL switch.** If `predicted_diff_ms > last_full_ms × pg_trickle.prediction_ratio` (default 1.5), override strategy to FULL; log `refresh_reason = 'predicted_cost_exceeds_full'`. | 1d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.3 | | PRED-3 | **Cold-start fallback.** When fewer than `pg_trickle.prediction_min_samples` (default 5) history rows exist, fall back to the existing fixed-threshold logic. | 0.5d | — | | PRED-4 | **E2E test + proptest.** Verify pre-emptive switch fires under synthetic cost spike; proptest checks cold-start fallback boundary (0–4 samples). | 1d | — | > **Predictive cost model subtotal: ~1 week** ### SLA-Driven Tier Auto-Assignment (P2 — §9.7) > **In plain terms:** `alter_stream_table(name, sla => interval '30 seconds')` > lets the scheduler pick the right tier automatically — no manual tier > tuning required. Removes the expert-knowledge barrier to tiered scheduling. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | SLA-1 | **`sla` parameter on `create_stream_table` / `alter_stream_table`.** Accepts an `INTERVAL`; stored as `freshness_deadline_ms` in `pgt_stream_tables`. | 0.5d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.7 | | SLA-2 | **Initial tier assignment.** On creation or `alter_stream_table` with `sla` set, assign to the tier whose `dispatch_gap ≤ sla`, considering current queue depth. | 1d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.7 | | SLA-3 | **Dynamic re-assignment.** After each tick, check whether the ST's tier still meets the SLA given measured queue depth; bump one tier up or down if the gap is consistently exceeded or under-utilised by >2×. | 1d | [PLAN_OVERALL_ASSESSMENT.md](plans/PLAN_OVERALL_ASSESSMENT.md) §9.7 | | SLA-4 | **E2E test.** Create ST with 30 s SLA; inject artificial tick delay; verify tier promotion within 3 cycles. | 0.5d | — | > **SLA-driven tier subtotal: ~3–4 days** ### Implementation Phases | Phase | Description | Duration | |-------|-------------|----------| | CDC-PUB | Downstream CDC publication: SQL function, lifecycle, monitoring, tests, docs | Days 1–8 | | PAR | Parallel refresh: coordinator/worker split, GUC, DAG levels, recovery, tests | Days 9–28 | | PRED | Predictive cost model: regression, pre-emptive switch, cold-start fallback, tests | Days 29–33 | | SLA | SLA-driven tier: `sla` param, initial assignment, dynamic re-assignment, tests | Days 34–37 | > **v0.22.0 total: ~5 weeks** (downstream CDC + parallel refresh + predictive cost + SLA tier) **Exit criteria:** - [x] CDC-PUB-1: `stream_table_to_publication(name)` creates a working logical publication - [x] CDC-PUB-2: Publication is dropped automatically when the stream table is dropped - [x] CDC-PUB-3: `downstream_publication` column visible in `pg_stat_stream_tables` - [x] CDC-PUB-4: Subscriber receives correct insert/update/delete events; E2E test passes - [x] PAR-2: `max_parallel_workers = 0` (default) produces identical results to serial mode - [x] PAR-1/PAR-3: `max_parallel_workers ≥ 1` dispatches independent same-level STs concurrently - [x] PAR-4: Worker crash marks ST `ERROR`; coordinator respawns worker slot - [x] PAR-5: Diamond DAG concurrent correctness test passes; throughput improvement benchmarked - [x] PRED-1: Fitted coefficients visible in `df_threshold_advice` - [x] PRED-2: Pre-emptive FULL switch fires under synthetic spike; `refresh_reason = 'predicted_cost_exceeds_full'` logged - [x] PRED-3: Cold-start fallback active when fewer than `prediction_min_samples` history rows exist - [x] SLA-1: `create_stream_table(..., sla => '30 seconds')` stores `freshness_deadline_ms` - [x] SLA-2: Initial tier assignment matches SLA requirement on creation - [x] SLA-3: Tier auto-adjusts within 3 cycles when queue depth breaches SLA - [x] Extension upgrade path tested (`0.21.0 → 0.22.0`) - [x] `just check-version-sync` passes ---