> **Plain-language companion:** [v0.41.0.md](v0.41.0.md) ## v0.41.0 — DVM Correctness: Structural Cache Keys, Placeholder Safety & WAL Transition Guards **Status: Released.** Derived from [plans/PLAN_OVERALL_ASSESSMENT_9.md](../plans/PLAN_OVERALL_ASSESSMENT_9.md) §Dimension 1 (Correctness), §Dimension 6 (Test Coverage). > **Release Theme** > Fix the three P0 correctness risks identified in the overall assessment: > DVM snapshot CTE cache-key collisions, unresolved placeholder pass-through, > and WAL transition TOCTOU window. Add the tests needed to prove each fix. --- ### Features | ID | Title | Effort | Priority | Assessment ref | |----|-------|--------|----------|----------------| | A41-1 | Structural snapshot CTE cache key fingerprint | L | P0 | COR-01 | | A41-2 | Placeholder resolution full-validation assertion | M | P0 | COR-02 | | A41-3 | WAL transition eligibility recheck at commit point | M | P0 | COR-03 | | A41-4 | Pool worker `pg_trickle.enabled` check before job claim | S | P1 | COR-06 | | A41-5 | Document isolation invariants for job execution modes | S | P2 | COR-09 | **A41-1 — Structural snapshot CTE cache key fingerprint.** Replace the current `snapshot_cache_key` in `src/dvm/diff.rs` with a structural fingerprint of the `OpTree`: operator type, join type, predicates, projected columns, filters, grouping, and child fingerprints recursively. The key must be collision-resistant for any two structurally different subtrees, even when they share the same leaf aliases. Add unit tests with two subtrees using identical leaf aliases but different join predicates/shapes and verify they produce distinct cache keys. **A41-2 — Placeholder resolution full-validation assertion.** After every placeholder substitution pass in `resolve_delta_template` (`src/dvm/mod.rs`) and `resolve_lsn_placeholders` (`src/refresh/codegen.rs`), run a strict regex check for remaining `__PGS_[A-Z0-9_]+__` and `__PGT_[A-Z0-9_]+__` tokens. On match, return a typed `PgTrickleError` with the stream table name and unresolved token. This turns late SQL execution failures into deterministic, early, actionable errors. **A41-3 — WAL transition eligibility recheck at commit point.** In `src/wal_decoder.rs`, hold a per-source advisory lock across transition phases or re-check eligibility (table relkind, existence, primary key/replica identity, replica identity FULL) immediately before the `Transitioning`/WAL catalog state update is committed. If the recheck fails, abort the transition and fall back to trigger mode with a logged warning and a catalog status update visible through diagnostics. **A41-4 — Pool worker `pg_trickle.enabled` check.** In `src/scheduler/pool.rs` (`pg_trickle_pool_worker_main`), check `config::pg_trickle_enabled()` before claiming each job from the queue. When disabled, defer or cancel queued jobs and sleep until re-enabled. **A41-5 — Document isolation invariants.** Add code-level comments in `src/scheduler/mod.rs` describing the snapshot isolation contract for singleton, atomic group, repeatable-read group, cyclic SCC, immediate closure, and fused-chain execution modes. ### Test Coverage | ID | Title | Effort | Priority | Assessment ref | |----|-------|--------|----------|----------------| | T-A41-1 | Snapshot cache key structural collision unit tests | M | P0 | COR-01 | | T-A41-2 | Placeholder resolution unit tests (unknown, mixed, repeated) | M | P0 | TEST-04 | | T-A41-3 | WAL transition concurrent DDL E2E tests | L | P0 | TEST-03 | | T-A41-4 | Pool worker disabled-mode E2E test | S | P1 | COR-06 | **T-A41-1.** Add pure unit tests for `snapshot_cache_key` / `get_or_register_snapshot_cte` with two subtrees using identical leaf aliases but different join predicates, join types, or subtree shapes. Assert distinct keys. Also test that structurally identical subtrees produce identical keys. **T-A41-2.** Add pure unit tests for both resolver functions: known placeholder families, unknown placeholder families, missing OIDs, repeated placeholders, pgt-prefixed placeholders, mixed sets, and zero-change pruning. Assert typed error on any unresolved token. **T-A41-3.** Add E2E tests that start a WAL transition, then concurrently change replica identity, drop a primary key, or drop the source table. Assert that the transition fails safely, the stream table falls back to trigger mode with no data loss, and the catalog status reflects the failure reason. **T-A41-4.** Add an E2E test with `worker_pool_size > 0`, queue work, then set `pg_trickle.enabled = off`. Assert that no queued jobs are executed. ### Conflicts & Risks - **A41-1** changes the cache key, so all cached templates become stale on upgrade. This is safe because templates are re-generated on cache miss, but the first refresh after upgrade may be slower. - **A41-3** may require advisory lock coordination with the scheduler. Test for deadlock scenarios with concurrent transition and scheduled refresh. ### Exit Criteria - [x] A41-1: No two structurally different subtrees share a snapshot CTE key (unit test proof) - [x] A41-2: Unresolved placeholder in any template raises a typed error (unit test proof) - [x] A41-3: Concurrent DDL during WAL transition causes safe fallback (E2E proof) - [x] A41-4: Pool workers respect `pg_trickle.enabled = off` (E2E proof) - [x] Extension upgrade path tested (`0.40.0 → 0.41.0`) - [x] `just lint` passes with zero warnings - [x] `just test-all` passes