> **Plain-language companion:** [v0.78.0.md](v0.78.0.md)

# v0.78.0 Full Implementation Status

## Theme: DVM Engine Root-Cause Fixes + Scheduler Intelligence

**Release arc:** Assessment-15 Hardening Arc v0.77.0–v0.80.0
**Status:** ✅ Released

## Implementation Checklist

| Item | Description | Status |
|------|-------------|--------|
| DVM-1 | CASE/IN-list drift definitive rejection path — append-only bypass (no FULL fallback for append-only sources) | ✅ Done |
| DVM-2 | Correlated aggregate subquery pre-aggregation CTE rewrite for safe patterns (bare-column single-level correlations) | ✅ Done |
| P-1 | FULL fallback with `CORRELATED_SUBQUERY_DELTA_QUADRATIC` reason for unsafe patterns after DVM-2 rewrite attempt | ✅ Done |
| P-2 | OpTree-based query complexity classifier stored in `pgt_stream_tables.query_complexity_class`; used in scheduler logs | ✅ Done |
| P-3 | Cost model rolling summary table `pgt_cost_model_summary`; batch scheduler lookup replaces per-ST `pgt_refresh_history` scans | ✅ Done |
| P-4 | Placeholder resolver Aho-Corasick automaton cache collision guard: canonical key stored and verified before reuse | ✅ Done |
| T-2 | `test_t2_sf10_tpc_h_latency_regression`: rotating SF-10 TPC-H subset with per-query EXPLAIN latency thresholds | ✅ Done |
| T-3 | Nightly `fuzz-nightly.yml` workflow: 300 seconds per target with corpus size tracking | ✅ Done |

## Files Changed

### Core Implementation

| File | Change |
|------|--------|
| `src/refresh/merge/mod.rs` | DVM-1: append-only bypass for CASE/IN-list fallback; DVM-2/P-1: try CTE rewrite before forcing FULL |
| `src/dvm/parser/rewrites.rs` | DVM-2: `rewrite_correlated_aggregate_in_where_to_cte()` pre-aggregation rewrite |
| `src/dvm/parser/mod.rs` | DVM-2: export `rewrite_correlated_aggregate_in_where_to_cte` |
| `src/refresh/mod.rs` | P-2: `classify_query_complexity_optree()` using OpTree scan count |
| `src/catalog.rs` | P-2: `query_complexity_class` field in `StreamTableMeta`; store at create time |
| `src/scheduler/mod.rs` | P-3: batch cost-model summary update + read path; P-2: log complexity class |
| `src/dvm/mod.rs` | P-4: `CachedPlaceholderResolver` canonical key guard |

### Tests

| File | Change |
|------|--------|
| `tests/e2e_v078_tests.rs` | DVM-1/DVM-2/P-4 E2E tests |
| `tests/e2e_tpch_tests.rs` | T-2: `test_t2_sf10_tpc_h_latency_regression` |

### CI

| File | Change |
|------|--------|
| `.github/workflows/fuzz-nightly.yml` | T-3: 300s nightly fuzz + corpus tracking |

### Supporting Files

| File | Change |
|------|--------|
| `Cargo.toml` | Version bump: `0.77.0` → `0.78.0` |
| `META.json` | Version bump: `0.77.0` → `0.78.0` |
| `sql/pg_trickle--0.77.0--0.78.0.sql` | Add `query_complexity_class` column + `pgt_cost_model_summary` table |
| `sql/archive/pg_trickle--0.78.0.sql` | Full install SQL for v0.78.0 |
| `CHANGELOG.md` | v0.78.0 entry |
| `ROADMAP.md` | v0.78.0 status updated to ✅ Released |

## Design Notes

### DVM-1: CASE/IN-list Definitive Rejection Path

The SUM(CASE WHEN ...) drift occurs specifically during UPDATE operations: when
a row's CASE-condition column is modified, the algebraic delta rule evaluates
the CASE with the new value for both the 'D' and 'I' sides, miscounting the
old contribution. This is the DI-8 issue (tracked for DI-2 UPDATE-split fix).

For **append-only** sources (no UPDATE), the drift cannot occur: only INSERT
and DELETE operations change data, and the GROUP_RESCAN (EXCEPT ALL) path
correctly handles both. The v0.78.0 bypass skips the FULL fallback when
`st.is_append_only = true`.

For **mutable** sources, the FULL fallback with `CASE_IN_LIST_DVM_DRIFT_FULL_FALLBACK`
reason code is now the **definitive rejection path** — the "fix planned for
v0.78.0" note is removed and replaced with documentation that this requires
DI-2 (UPDATE-split) to fix correctly.

### DVM-2: CTE Pre-Aggregation Rewrite

The rewrite targets queries where a scalar correlated aggregate subquery appears
directly in the top-level WHERE clause (not nested inside IN-sublinks). The
correlation must use bare-column equality conditions detectable via `pg_attribute`
catalog lookup.

For example:
```sql
-- Input:
SELECT ps_suppkey FROM partsupp
WHERE ps_availqty > (SELECT 0.5 * SUM(l_quantity) FROM lineitem
                     WHERE l_partkey = ps_partkey AND l_suppkey = ps_suppkey
                     AND l_shipdate >= '1994-01-01')
-- Rewritten to CTE form (handled differentially):
WITH __pgt_preagg_1 AS (
  SELECT l_partkey AS "__pgt_corr_key_1", l_suppkey AS "__pgt_corr_key_2",
         0.5 * SUM(l_quantity) AS "__pgt_scalar_1"
  FROM lineitem WHERE l_shipdate >= '1994-01-01'
  GROUP BY l_partkey, l_suppkey
)
SELECT ps_suppkey FROM partsupp, __pgt_preagg_1
WHERE ps_availqty > "__pgt_preagg_1"."__pgt_scalar_1"
  AND ps_partkey = "__pgt_preagg_1"."__pgt_corr_key_1"
  AND ps_suppkey = "__pgt_preagg_1"."__pgt_corr_key_2"
```

The rewrite is done at the text/AST level using the existing
`decorrelate_scalar_subquery()` infrastructure. Complex nested patterns (like
full q20 with IN-sublinks wrapping the correlated aggregate) are classified as
unsafe and fall back to FULL with `CORRELATED_SUBQUERY_DELTA_QUADRATIC`.

### P-2: OpTree Complexity Classifier

The `classify_query_complexity()` function previously used keyword matching
(regex-like). The new `classify_query_complexity_optree()` uses
`dvm::query_total_scan_count()` to count relation scans and
`dvm::query_has_group_by()` to detect aggregation. This gives a more accurate
classification without regex false positives. The class is stored in
`pgt_stream_tables.query_complexity_class` at create time and updated on ALTER.

### P-3: Cost Model Summary Table

`pgtrickle.pgt_cost_model_summary` holds one row per `pgt_id` with:
- `avg_full_ms DOUBLE PRECISION` — rolling 10-sample average of FULL refresh duration
- `avg_diff_ms DOUBLE PRECISION` — rolling 10-sample average of DIFFERENTIAL refresh duration
- `sample_count INT` — number of samples included
- `updated_at TIMESTAMPTZ` — timestamp of last update

The scheduler upserts this table once per tick in a single batch query across
all active STs, then reads from it (also in batch) instead of issuing per-ST
`SELECT ... FROM pgt_refresh_history WHERE pgt_id = $1 LIMIT 1` queries.

### P-4: Placeholder Resolver Cache Collision Guard

`CachedPlaceholderResolver` now stores `canonical_key_src: String` alongside
the compiled `AhoCorasick` automaton. Before reusing a cached resolver, the
code verifies that `hash_string(&resolver_key_src) == cached_key` AND
`resolver_key_src == cached.canonical_key_src`. On mismatch (hash collision):
- Log a `pgrx::warning!` with both strings for diagnostics
- Rebuild the automaton and store it under the new key
This prevents silent template corruption caused by 64-bit hash collisions.

## Exit Criteria

- [x] DVM-1: append-only bypass tested in `test_dvm1_case_in_list_append_only_differential`
- [x] DVM-2: CTE rewrite tested in `test_dvm2_correlated_aggregate_cte_rewrite`
- [x] P-1: unsafe pattern still gets FULL fallback with correct reason code
- [x] P-2: `query_complexity_class` populated and logged
- [x] P-3: `pgt_cost_model_summary` table exists and is updated by scheduler
- [x] P-4: canonical key mismatch detection tested in unit tests
- [x] T-2: TPC-H latency regression test present
- [x] T-3: Nightly fuzz workflow at 300s added
- [x] `just test-unit` passes
- [x] `just test-integration` passes
- [x] `just lint` passes with zero warnings
- [x] CHANGELOG.md v0.78.0 entry written
- [x] ROADMAP.md v0.78.0 status ✅ Released