# v0.72.0 Full Details — Frontier Durability & Catalog Correctness > **Summary:** [v0.72.0.md](v0.72.0.md) > **Assessment source:** [plans/PLAN_OVERALL_ASSESSMENT_14.md](../plans/PLAN_OVERALL_ASSESSMENT_14.md) --- ## Motivation Assessment 14 identified a cluster of correctness findings where the code *appeared* to implement a safety guarantee but the guarantee was either wired incorrectly, had no call site, or relied on invalid SQL. None of these were latent design flaws requiring a redesign — they were wiring gaps. v0.72.0 closes all four. The theme of this release is: **if a path exists in the code it must actually work, and if it must work it must be tested**. --- ## Detailed Implementation ### COR-001 / REL-001 / ARCH-001: DUR-1 Tentative-Frontier Removal **Problem:** The catalog layer contained a "two-phase tentative-frontier" design (DUR-1): 1. `prepare_frontier(pgt_id, frontier)` — write proposed frontier to `tentative_frontier` column before MERGE. 2. `finalize_frontier_and_complete_refresh(pgt_id, rows)` (DUR-1) — promote `tentative_frontier → frontier` after MERGE commits. 3. `reconcile_tentative_frontiers(change_schema)` — startup recovery scan for stale tentative frontiers. All three functions had **zero production call sites**. The recovery query in `reconcile_tentative_frontiers` contained a string-concatenation fragment (`FROM {schema}.changes_ || s.pgt_relid::text`) that would always fail at runtime — it is not valid relation-name syntax. The scheduler hot-path used the single-phase `store_frontier` but silenced errors: ```rust // ❌ BEFORE (6 locations in scheduler/mod.rs) if let Err(e) = StreamTableMeta::store_frontier(id, &frontier) { log!("Could not store frontier: {}", e); } ``` This meant: even if DUR-1 had been wired, the actual call site provided no durability guarantee — a frontier-store failure would log a warning and commit the data change with no frontier advancement, causing double-processing on the next tick. **Fix:** - Removed `prepare_frontier`, `finalize_frontier_and_complete_refresh` (DUR-1 variant), and `reconcile_tentative_frontiers` from `src/catalog.rs`. - All six scheduler refresh paths in `src/scheduler/mod.rs` converted to: ```rust // ✅ AFTER StreamTableMeta::store_frontier(pgt_id, &frontier)?; ``` A frontier-store failure now aborts the outer transaction, rolling back the data change. The stream table is retried on the next scheduler tick. - The `tentative_frontier` column is retained in the schema (removing a catalog column requires a new migration) but is never written. - See [ADR-004](../plans/adrs/ADR-004.md) for the architectural rationale. **Files changed:** - `src/catalog.rs` — removed 3 DUR-1 functions; added CODE-001 comment block - `src/scheduler/mod.rs` — 6 log-only patterns → `?` propagation **Tests:** TEST-003 unit tests added to `src/catalog.rs`; compile-time guard for absence of DUR-1 functions; frontier JSON roundtrip; `is_empty()` checks. --- ### COR-002 / API-001: Outbox `stream_table_oid` OID Correctness **Problem:** `pgtrickle.pgt_outbox_config.stream_table_oid` was populated with `pgt_id as u32` cast to `pg_sys::Oid`. `pgt_id` is a sequential integer primary key of `pgt_stream_tables` — not an OID present in `pg_class`. Any consumer that resolved the OID via `pg_class` would either find the wrong table or find nothing at all. The read paths (`is_outbox_enabled`, `get_outbox_table_name`, `get_embedding_vector_column`) used `WHERE stream_table_oid = $1` with `pgt_id` as the parameter — the queries worked by accident only because the write also stored `pgt_id` in that column. **Fix:** All five write/read paths in `src/api/outbox.rs`: | Function | Before | After | |----------|--------|-------| | `attach_outbox_impl` INSERT | `pgt_id as u32` | `meta.pgt_relid.into()` | | `detach_outbox_impl` DELETE | `pgt_id as u32` | `meta.pgt_relid.into()` | | `attach_embedding_outbox_impl` UPDATE | `WHERE pgt_schema=$2 AND pgt_name=$3` | sub-SELECT on `pgt_relid` | | `is_outbox_enabled` | `WHERE stream_table_oid = pgt_id` | JOIN through `pgt_stream_tables` | | `get_outbox_table_name` | `WHERE stream_table_oid = pgt_id` | JOIN through `pgt_stream_tables` | | `get_embedding_vector_column` | `WHERE stream_table_oid = pgt_id` | JOIN through `pgt_stream_tables` | The migration SQL corrects existing rows in production databases. **Files changed:** - `src/api/outbox.rs` — all 5 locations **Tests:** TEST-001 E2E tests added to `tests/e2e_outbox_tests.rs`: - `test_outbox_stream_table_oid_equals_pgt_relid` — JOIN invariant - `test_outbox_stream_table_oid_exists_in_pg_class` — `pg_class.oid` existence --- ### COR-003: WAL Transition Handoff Gate **Problem:** `complete_wal_transition` in `src/wal_decoder.rs` executed: 1. `cdc::drop_change_trigger(source_oid)` — removes the row-level trigger 2. `StDependency::update_cdc_mode_for_source(source_oid, CdcMode::Wal)` In the window between step 1 and step 2, the CDC trigger was gone but the catalog still indicated TRIGGER mode. Any writes to the source table in this window would neither be captured by the trigger nor by the WAL reader (which checks the catalog mode before processing). Those writes could be permanently lost. **Fix:** Steps are reversed and wrapped in `pg_advisory_lock(-(oid_u32 as i64))`: 1. Acquire advisory lock on the stream table OID (negative key avoids collision with positive-key advisory locks used elsewhere). 2. Update catalog mode to `WAL`. 3. Drop the CDC trigger. 4. Release the advisory lock. If step 2 or 3 fails, the lock is released and the error is propagated. The atomicity gap is eliminated: when the trigger is gone, the catalog has already been updated to WAL mode. **Files changed:** - `src/wal_decoder.rs` — `complete_wal_transition` function --- ### COR-004: Pristine-Transaction Guard for Replication Slot Creation **Problem:** PostgreSQL forbids creating a replication slot in a transaction that has already been assigned an XID (e.g. after any catalog write). `create_replication_slot_pristine` had no guard for this, and would fail with PostgreSQL error code `55006` (object_in_use) or similar in transactions that had performed DDL. **Fix:** Added a check using `pg_sys::GetCurrentTransactionIdIfAny()` before calling the slot-creation primitive: ```rust // SAFETY: GetCurrentTransactionIdIfAny reads a thread-local; no side effects. unsafe { if pg_sys::GetCurrentTransactionIdIfAny() != pg_sys::InvalidTransactionId { return Err(PgTrickleError::ReplicationSlotError( "replication slot must be created in a transaction with no prior \ writes; call create_replication_slot_pristine in a fresh \ transaction".to_string(), )); } } ``` This returns an actionable error message to the caller instead of propagating a PostgreSQL internal error with no context. **Files changed:** - `src/wal_decoder.rs` — `create_replication_slot_pristine` function --- ## Migration Notes `sql/pg_trickle--0.71.0--0.72.0.sql` corrects any existing `pgt_outbox_config.stream_table_oid` values that were stored as the internal `pgt_id` integer rather than the actual `pg_class` OID: ```sql UPDATE pgtrickle.pgt_outbox_config oc SET stream_table_oid = st.pgt_relid FROM pgtrickle.pgt_stream_tables st WHERE oc.stream_table_oid = st.pgt_id::oid AND oc.stream_table_oid != st.pgt_relid; ``` No other schema changes are made. The `tentative_frontier` column is retained but will always be NULL after this version. --- ## Test Coverage | ID | Type | File | Description | |----|------|------|-------------| | TEST-001a | E2E | `tests/e2e_outbox_tests.rs` | `stream_table_oid = pgt_relid` JOIN invariant | | TEST-001b | E2E | `tests/e2e_outbox_tests.rs` | `stream_table_oid` exists in `pg_class.oid` | | TEST-003a | Unit | `src/catalog.rs` | Frontier JSON serialization roundtrip | | TEST-003b | Unit | `src/catalog.rs` | Default frontier is empty | | TEST-003c | Unit | `src/catalog.rs` | Frontier with entry is not empty | | TEST-003d | Unit | `src/catalog.rs` | Compile-time guard: DUR-1 functions absent | --- ## ADR References - [ADR-004: Frontier Durability Model](../plans/adrs/ADR-004.md)