> **Plain-language companion:** [v0.24.0.md](v0.24.0.md) ## v0.24.0 — Join Correctness & Durability Hardening **Status: Released (2026-04-20).** Sourced from [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3, §4, §6. > **Release Theme** > This release closes the remaining **critical correctness bugs** and > **data-durability gaps** identified in the v0.23.0 deep assessment. > The EC-01 join phantom-row bug — deferred since v0.21.0 — is finally > resolved, restoring full DIFFERENTIAL correctness for multi-table > LEFT/RIGHT/FULL JOINs under mixed DML. Change-buffer durability > becomes configurable, and a two-phase frontier commit eliminates the > crash-replay window. Supporting work includes TOAST-aware CDC hashing, > partitioned-source publication health checks, history retention, and > a unit-test campaign for the v0.21–v0.23 surface area. ### EC-01 Join Correctness Fix | Item | Description | Effort | Ref | |------|-------------|--------|-----| | EC01-1 | **Row-id hash convergence for Part 1b.** Modify `src/dvm/operators/join.rs` so the Part 1b arm (Δ⋈R₀) hashes only the left-side PK, ensuring both Part 1a and 1b emit the same `__pgt_row_id` for a given logical row. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #1 | | EC01-2 | **PH-D1 cross-cycle phantom cleanup.** Extend the PH-D1 delete path in `src/refresh/phd1.rs` to reconcile orphaned row ids from prior cycles, not just the current delta. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #1 | | EC01-3 | **Remove Q15 from IMMEDIATE_SKIP_ALLOWLIST.** Re-enable TPC-H Q15 in IMMEDIATE mode correctness tests after EC01-1/2 land. | 0.5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #1 | | EC01-4 | **Proptest harness for join cross-cycle convergence.** 5,000-iteration property test asserting INSERT/UPDATE/DELETE sequences on multi-table JOINs converge to the same result as a full refresh. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #1 | ### Durability & Frontier Atomicity | Item | Description | Effort | Ref | |------|-------------|--------|-----| | DUR-1 | **Two-phase frontier commit.** Write a tentative frontier to a side column before TRUNCATE; finalise after MERGE commits; reconcile on startup. Unifies the manual-refresh and scheduler code paths. | 5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #2 | | DUR-2 | **`pg_trickle.change_buffer_durability` GUC.** New GUC with values `unlogged` (default, current behaviour), `logged` (WAL-logged change buffers), `sync` (logged + synchronous commit). | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #2 | | DUR-3 | **Crash-recovery E2E test for frontier consistency.** Kill bgworker between TRUNCATE and frontier-store; assert no phantom replays or lost rows on restart. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #2 | ### CDC Hardening | Item | Description | Effort | Ref | |------|-------------|--------|-----| | CDC-1 | **Eliminate two `unwrap()` sites in `src/cdc.rs`.** Convert `build_changed_cols_bitmask_expr().unwrap()` to `?` with a new `PgTrickleError::ChangedColsBitmaskFailed` variant. | 0.5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #3 | | CDC-2 | **Partitioned-source publication rebuild.** On scheduler tick, compare `pg_publication_tables.pubviaroot` against source `relkind = 'p'`; rebuild publication with `publish_via_partition_root = true` if mismatched. Emit `refresh_reason = 'publication_rebuild'`. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #4 | | CDC-3 | **TOAST-aware CDC hashing.** Include `pg_column_size()` for TOASTable columns (`attstorage IN ('e', 'x')`) in the row-id hash to detect in-place TOAST rewrites. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §3 #5 | | CDC-4 | **TOAST workload E2E tests.** Add jsonb-update and bytea-update scenarios to `tests/e2e_cdc_edge_case_tests.rs`. | 1d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §6 | ### Operational Improvements | Item | Description | Effort | Ref | |------|-------------|--------|-----| | OPS-1 | **`pg_trickle.refresh_history_retention_days` GUC.** Default 7 days. Bgworker prunes stale rows in 1k-row batches during idle ticks. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | OPS-2 | **Frozen-stream-table detector.** New self-monitoring view `df_frozen_stream_tables` that flags any ST whose `last_refresh_at < now() - 5 × refresh_interval` with recent CDC activity. Alert via `pgtrickle_alert` NOTIFY. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | OPS-3 | **Missing internal catalog indexes.** Add composite indexes on `pgt_stream_tables(status, scc_id)`, `pgt_refresh_history(pgt_id, action, data_timestamp)`, `pgt_change_tracking(source_relid)`, and a partial index on `changes_(__pgt_action)`. | 1d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | ### Test Coverage (TEST-6/7/8) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | TEST-6 | **Unit tests for `src/api/publication.rs`.** Cover `fit_linear_regression`, `predict_diff_duration_ms`, `should_preempt_to_full`, `assign_tier_for_sla`, `maybe_adjust_tier_for_sla`, boundary cases (0, negative, NaN). 25+ tests. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §6 | | TEST-7 | **Unit tests for `src/api/diagnostics.rs`.** Cover `explain_query_rewrite`, `diagnose_errors`, `validate_query`, 5 `gather_*` helpers. 20+ tests. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §6 | | TEST-8 | **Unit tests for `src/metrics_server.rs`.** Cover port-conflict handling, timeout behaviour, malformed HTTP request, OpenMetrics format conformance. 10+ tests. | 1d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §6 | ### Implementation Phases | Phase | Description | Duration | |-------|-------------|----------| | Phase 1 | EC-01 fix: row-id hash convergence + PH-D1 cleanup + proptest | Days 1–8 | | Phase 2 | Durability: two-phase frontier + change_buffer_durability GUC + crash test | Days 8–18 | | Phase 3 | CDC hardening: unwrap removal, publication rebuild, TOAST hashing + tests | Days 18–26 | | Phase 4 | Operational: history retention, frozen-ST detector, catalog indexes | Days 26–31 | | Phase 5 | Test campaign: TEST-6/7/8 unit tests for publication, diagnostics, metrics | Days 31–37 | | Phase 6 | Integration testing, documentation, upgrade script | Days 37–42 | > **v0.24.0 total: ~8–9 weeks** (~42 person-days solo) **Exit criteria:** - [x] EC01-1: Part 1b arm hashes left-side PK only; TPC-H Q07 passes multi-cycle correctness - [x] EC01-2: PH-D1 cleans up prior-cycle phantoms; no residual rows after 10 cycles - [x] EC01-3: Q15 removed from IMMEDIATE_SKIP_ALLOWLIST; TPC-H Q15 passes IMMEDIATE mode - [x] EC01-4: 5,000-iteration proptest passes for JOIN convergence - [x] DUR-1: Two-phase frontier commit implemented; manual and scheduler paths unified - [x] DUR-2: `change_buffer_durability = 'logged'` creates WAL-logged change buffers; `'unlogged'` preserves current behaviour - [x] DUR-3: Crash-recovery E2E: kill bgworker mid-refresh → restart → zero lost/duplicated rows - [x] CDC-1: Zero `unwrap()` calls in `src/cdc.rs` production paths - [x] CDC-2: Converting a source table to partitioned triggers automatic publication rebuild - [x] CDC-3: TOAST-only column update detected and propagated in DIFFERENTIAL mode - [x] CDC-4: jsonb + bytea TOAST E2E tests pass - [x] OPS-1: History older than retention_days is pruned automatically; GUC documented - [x] OPS-2: Frozen-ST detector fires alert when ST stalls with active CDC source - [x] OPS-3: Internal catalog indexes exist; scheduler tick time reduced at 100+ STs - [x] TEST-6: 25+ publication.rs unit tests pass (predictive model boundary cases) - [x] TEST-7: 20+ diagnostics.rs unit tests pass - [x] TEST-8: 10+ metrics_server.rs unit tests pass (port conflict, timeout, format) - [x] Extension upgrade path tested (`0.23.0 → 0.24.0`) - [x] `just check-version-sync` passes ---