> **Plain-language companion:** [v0.78.0.md](v0.78.0.md) # v0.78.0 Full Implementation Status ## Theme: DVM Engine Root-Cause Fixes + Scheduler Intelligence **Release arc:** Assessment-15 Hardening Arc v0.77.0–v0.80.0 **Status:** ✅ Released ## Implementation Checklist | Item | Description | Status | |------|-------------|--------| | DVM-1 | CASE/IN-list drift definitive rejection path — append-only bypass (no FULL fallback for append-only sources) | ✅ Done | | DVM-2 | Correlated aggregate subquery pre-aggregation CTE rewrite for safe patterns (bare-column single-level correlations) | ✅ Done | | P-1 | FULL fallback with `CORRELATED_SUBQUERY_DELTA_QUADRATIC` reason for unsafe patterns after DVM-2 rewrite attempt | ✅ Done | | P-2 | OpTree-based query complexity classifier stored in `pgt_stream_tables.query_complexity_class`; used in scheduler logs | ✅ Done | | P-3 | Cost model rolling summary table `pgt_cost_model_summary`; batch scheduler lookup replaces per-ST `pgt_refresh_history` scans | ✅ Done | | P-4 | Placeholder resolver Aho-Corasick automaton cache collision guard: canonical key stored and verified before reuse | ✅ Done | | T-2 | `test_t2_sf10_tpc_h_latency_regression`: rotating SF-10 TPC-H subset with per-query EXPLAIN latency thresholds | ✅ Done | | T-3 | Nightly `fuzz-nightly.yml` workflow: 300 seconds per target with corpus size tracking | ✅ Done | ## Files Changed ### Core Implementation | File | Change | |------|--------| | `src/refresh/merge/mod.rs` | DVM-1: append-only bypass for CASE/IN-list fallback; DVM-2/P-1: try CTE rewrite before forcing FULL | | `src/dvm/parser/rewrites.rs` | DVM-2: `rewrite_correlated_aggregate_in_where_to_cte()` pre-aggregation rewrite | | `src/dvm/parser/mod.rs` | DVM-2: export `rewrite_correlated_aggregate_in_where_to_cte` | | `src/refresh/mod.rs` | P-2: `classify_query_complexity_optree()` using OpTree scan count | | `src/catalog.rs` | P-2: `query_complexity_class` field in `StreamTableMeta`; store at create time | | `src/scheduler/mod.rs` | P-3: batch cost-model summary update + read path; P-2: log complexity class | | `src/dvm/mod.rs` | P-4: `CachedPlaceholderResolver` canonical key guard | ### Tests | File | Change | |------|--------| | `tests/e2e_v078_tests.rs` | DVM-1/DVM-2/P-4 E2E tests | | `tests/e2e_tpch_tests.rs` | T-2: `test_t2_sf10_tpc_h_latency_regression` | ### CI | File | Change | |------|--------| | `.github/workflows/fuzz-nightly.yml` | T-3: 300s nightly fuzz + corpus tracking | ### Supporting Files | File | Change | |------|--------| | `Cargo.toml` | Version bump: `0.77.0` → `0.78.0` | | `META.json` | Version bump: `0.77.0` → `0.78.0` | | `sql/pg_trickle--0.77.0--0.78.0.sql` | Add `query_complexity_class` column + `pgt_cost_model_summary` table | | `sql/archive/pg_trickle--0.78.0.sql` | Full install SQL for v0.78.0 | | `CHANGELOG.md` | v0.78.0 entry | | `ROADMAP.md` | v0.78.0 status updated to ✅ Released | ## Design Notes ### DVM-1: CASE/IN-list Definitive Rejection Path The SUM(CASE WHEN ...) drift occurs specifically during UPDATE operations: when a row's CASE-condition column is modified, the algebraic delta rule evaluates the CASE with the new value for both the 'D' and 'I' sides, miscounting the old contribution. This is the DI-8 issue (tracked for DI-2 UPDATE-split fix). For **append-only** sources (no UPDATE), the drift cannot occur: only INSERT and DELETE operations change data, and the GROUP_RESCAN (EXCEPT ALL) path correctly handles both. The v0.78.0 bypass skips the FULL fallback when `st.is_append_only = true`. For **mutable** sources, the FULL fallback with `CASE_IN_LIST_DVM_DRIFT_FULL_FALLBACK` reason code is now the **definitive rejection path** — the "fix planned for v0.78.0" note is removed and replaced with documentation that this requires DI-2 (UPDATE-split) to fix correctly. ### DVM-2: CTE Pre-Aggregation Rewrite The rewrite targets queries where a scalar correlated aggregate subquery appears directly in the top-level WHERE clause (not nested inside IN-sublinks). The correlation must use bare-column equality conditions detectable via `pg_attribute` catalog lookup. For example: ```sql -- Input: SELECT ps_suppkey FROM partsupp WHERE ps_availqty > (SELECT 0.5 * SUM(l_quantity) FROM lineitem WHERE l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND l_shipdate >= '1994-01-01') -- Rewritten to CTE form (handled differentially): WITH __pgt_preagg_1 AS ( SELECT l_partkey AS "__pgt_corr_key_1", l_suppkey AS "__pgt_corr_key_2", 0.5 * SUM(l_quantity) AS "__pgt_scalar_1" FROM lineitem WHERE l_shipdate >= '1994-01-01' GROUP BY l_partkey, l_suppkey ) SELECT ps_suppkey FROM partsupp, __pgt_preagg_1 WHERE ps_availqty > "__pgt_preagg_1"."__pgt_scalar_1" AND ps_partkey = "__pgt_preagg_1"."__pgt_corr_key_1" AND ps_suppkey = "__pgt_preagg_1"."__pgt_corr_key_2" ``` The rewrite is done at the text/AST level using the existing `decorrelate_scalar_subquery()` infrastructure. Complex nested patterns (like full q20 with IN-sublinks wrapping the correlated aggregate) are classified as unsafe and fall back to FULL with `CORRELATED_SUBQUERY_DELTA_QUADRATIC`. ### P-2: OpTree Complexity Classifier The `classify_query_complexity()` function previously used keyword matching (regex-like). The new `classify_query_complexity_optree()` uses `dvm::query_total_scan_count()` to count relation scans and `dvm::query_has_group_by()` to detect aggregation. This gives a more accurate classification without regex false positives. The class is stored in `pgt_stream_tables.query_complexity_class` at create time and updated on ALTER. ### P-3: Cost Model Summary Table `pgtrickle.pgt_cost_model_summary` holds one row per `pgt_id` with: - `avg_full_ms DOUBLE PRECISION` — rolling 10-sample average of FULL refresh duration - `avg_diff_ms DOUBLE PRECISION` — rolling 10-sample average of DIFFERENTIAL refresh duration - `sample_count INT` — number of samples included - `updated_at TIMESTAMPTZ` — timestamp of last update The scheduler upserts this table once per tick in a single batch query across all active STs, then reads from it (also in batch) instead of issuing per-ST `SELECT ... FROM pgt_refresh_history WHERE pgt_id = $1 LIMIT 1` queries. ### P-4: Placeholder Resolver Cache Collision Guard `CachedPlaceholderResolver` now stores `canonical_key_src: String` alongside the compiled `AhoCorasick` automaton. Before reusing a cached resolver, the code verifies that `hash_string(&resolver_key_src) == cached_key` AND `resolver_key_src == cached.canonical_key_src`. On mismatch (hash collision): - Log a `pgrx::warning!` with both strings for diagnostics - Rebuild the automaton and store it under the new key This prevents silent template corruption caused by 64-bit hash collisions. ## Exit Criteria - [x] DVM-1: append-only bypass tested in `test_dvm1_case_in_list_append_only_differential` - [x] DVM-2: CTE rewrite tested in `test_dvm2_correlated_aggregate_cte_rewrite` - [x] P-1: unsafe pattern still gets FULL fallback with correct reason code - [x] P-2: `query_complexity_class` populated and logged - [x] P-3: `pgt_cost_model_summary` table exists and is updated by scheduler - [x] P-4: canonical key mismatch detection tested in unit tests - [x] T-2: TPC-H latency regression test present - [x] T-3: Nightly fuzz workflow at 300s added - [x] `just test-unit` passes - [x] `just test-integration` passes - [x] `just lint` passes with zero warnings - [x] CHANGELOG.md v0.78.0 entry written - [x] ROADMAP.md v0.78.0 status ✅ Released