> **Plain-language companion:** [v0.16.0.md](v0.16.0.md)

## v0.16.0 — Performance & Refresh Optimization

**Status: Released (2026-04-06).**

Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of
MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic
aggregates apply pinpoint updates, and a cross-backend template cache eliminates
cold-start latency. Automated benchmark regression gating prevents future
performance degradation.

**Goal:** Attack the MERGE bottleneck from multiple angles — alternative merge
strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering,
change buffer compaction, shared-memory template caching — close critical test
coverage gaps to validate these new paths.

### MERGE Alternatives & Planner Control (Phase D)

> **In plain terms:** MERGE dominates 70–97% of refresh time. This explores
> whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE)
> is faster for specific patterns — particularly for small deltas against
> large stream tables where the MERGE join is the bottleneck.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~PH-D1~~ | ~~**DELETE+INSERT strategy.** For stream tables where delta is <1% of target, replace MERGE with `DELETE WHERE __pgt_row_id IN (delta_deletes)` + `INSERT ... SELECT FROM delta_inserts`. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind `pg_trickle.merge_strategy = 'auto'\|'merge'\|'delete_insert'` GUC.~~ | ~~1–2 wk~~ | ~~[PLAN_PERFORMANCE_PART_9.md §Phase D](plans/performance/PLAN_PERFORMANCE_PART_9.md)~~ |

> **MERGE alternatives subtotal: ~1–2 weeks**

### Algebraic Aggregate UPDATE Fast-Path (B-1)

> **In plain terms:** The current aggregate delta rule recomputes entire
> groups where the GROUP BY key appears in the delta. For a group with 100K
> rows where 1 row changed, the aggregate re-scans all 100K rows in that
> group. For decomposable aggregates (`SUM`/`COUNT`/`AVG`), a direct
> `UPDATE target SET col = col + Δ` replaces the full MERGE join — dropping
> aggregate refresh from O(group_size) to O(1) per group.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| B-1 | **Algebraic aggregate UPDATE fast-path.** For `GROUP BY` queries where all aggregates are algebraically invertible (`SUM`/`COUNT`/`AVG`), replace the MERGE with a direct `UPDATE target SET col = col + Δ WHERE group_key = ?` for existing groups, plus `INSERT` for newly-appearing groups and `DELETE` for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding `__pgt_aux_count` / `__pgt_aux_sum` auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (`MIN`, `MAX`, `STRING_AGG`, etc.). Gate behind `pg_trickle.aggregate_fast_path` GUC (default `true`). Expected impact: **5–20× apply-time reduction** for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time. | 4–6 wk | [plans/performance/PLAN_NEW_STUFF.md §B-1](plans/performance/PLAN_NEW_STUFF.md) · [plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4](plans/sql/PLAN_TRANSACTIONAL_IVM.md) |

> **B-1 subtotal: ~4–6 weeks**

### Append-Only Stream Tables — MERGE Bypass (A-3-AO)

> **In plain terms:** When a stream table's sources are insert-only (e.g.
> event logs, append-only tables where CDC never sees DELETE/UPDATE), the
> MERGE is pure overhead — every delta row is an INSERT, never a match.
> Bypassing MERGE entirely with a plain `INSERT INTO st SELECT ... FROM delta`
> removes the join against the target table, takes only `RowExclusiveLock`,
> and is the single highest-payoff optimization for event-sourced architectures.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~A-3-AO~~ | ~~**Append-only stream table fast path.** Expose an explicit `CREATE STREAM TABLE … APPEND ONLY` declaration. When set, refresh uses `INSERT INTO st SELECT ... FROM delta` instead of MERGE — no target-table join, `RowExclusiveLock` only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas.~~ | ~~1–2 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §A-3](plans/performance/PLAN_NEW_STUFF.md)~~ |

> **A-3-AO subtotal: ~1–2 weeks**

### Delta Predicate Pushdown (B-2)

> **In plain terms:** For a query like `SELECT ... FROM orders WHERE status =
> 'shipped'`, if a CDC change row has `status = 'pending'`, the delta
> processes it through scan → filter → discard. All the scan and join work
> is wasted. Pushing the WHERE predicate down into the change buffer scan
> eliminates irrelevant rows before any join processing begins — a 5–10×
> reduction in delta row volume for selective queries.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~B-2~~ | ~~**Delta predicate pushdown.** During OpTree construction, identify `Filter` nodes whose predicates reference only columns from a single source table. Inject these predicates into the `delta_scan` CTE as additional WHERE clauses (including `OR old_col = 'value'` for DELETE correctness). Expected impact: **5–10× delta row reduction** for queries with < 10% selectivity.~~ | ~~2–3 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §B-2](plans/performance/PLAN_NEW_STUFF.md)~~ |

> **B-2 subtotal: ~2–3 weeks**

### Shared-Memory Template Caching (G14-SHC)

> **In plain terms:** Every new database connection that triggers a refresh
> pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With
> PgBouncer in transaction mode, this happens on every single refresh cycle.
> Shared-memory caching stores compiled templates in PostgreSQL DSM so they
> survive across connections — eliminating the cold-start entirely for
> steady-state workloads.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| G14-SHC | **Shared-memory template caching (implementation).** Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0. | ~2–3wk | [plans/performance/REPORT_OVERALL_STATUS.md §14](plans/performance/REPORT_OVERALL_STATUS.md) |

> **G14-SHC subtotal: ~2–3 weeks**

### ~~PostgreSQL 19 Forward-Compatibility (A3)~~ — Moved to v1.0.0

> PG 19 beta not available in time. Items A3-1 through A3-4 deferred
> to v1.0.0 milestone.

### Change Buffer Compaction (C-4)

> **In plain terms:** A high-churn source table can accumulate thousands of
> changes to the same row between refresh cycles — an INSERT followed by 10
> UPDATEs followed by a DELETE is really just "nothing happened." Compaction
> merges multiple changes to the same row ID into a single net change before
> the delta query runs, reducing change buffer size by 50–90% for high-churn
> tables. This directly reduces work for every downstream path (MERGE,
> DELETE+INSERT, append-only INSERT, predicate pushdown).

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~C-4~~ | ~~**Change buffer compaction.** Before delta-query execution, merge multiple changes to the same `__pgt_row_id` into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding `pg_trickle.compact_threshold` rows (default: 100K). Expected impact: **50–90% reduction in change buffer size** for high-churn tables.~~ | ~~2–3 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §C-4](plans/performance/PLAN_NEW_STUFF.md)~~ |

> **C-4 subtotal: ~2–3 weeks**

### Test Coverage Hardening (TG2)

> **In plain terms:** The performance optimizations in this release change
> core refresh paths (MERGE alternatives, aggregate fast-path, append-only
> bypass, predicate pushdown). Before and alongside these changes, critical
> test coverage gaps need closing — particularly around operators and
> scenarios where bugs could hide silently. These gaps were identified in
> the TESTING_GAPS_2 audit.

#### High-Priority Gaps

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~TG2-WIN~~ | ~~**Window function DVM execution tests.** ~5 unit tests exist but 0 DVM execution tests. Add execution-level tests for ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD delta behavior across INSERT/UPDATE/DELETE cycles.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ |
| ~~TG2-JOIN~~ | ~~**Join multi-cycle UPDATE/DELETE correctness.** E2E join tests are INSERT-only; no UPDATE/DELETE differential cycles. Add systematic multi-cycle coverage for INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation. Risk: silent data corruption in production workloads.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ |
| ~~TG2-EQUIV~~ | ~~**Differential ≡ Full equivalence validation.** Only CTEs validated; joins and aggregates lack equivalence proof. Add a test harness that runs every defining query in both DIFFERENTIAL and FULL mode and asserts identical results. Critical for trusting the new optimization paths.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ |

#### Medium-Priority Gaps

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| TG2-MERGE | **refresh.rs MERGE template unit tests.** Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level. | 2–3d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) |
| TG2-CANCEL | **Timeout/cancellation during refresh.** Zero tests for `statement_timeout`, `pg_cancel_backend()` during active refresh. Risk: silent failures or resource leaks under production load. | 1–2d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) |
| TG2-SCHEMA | **Source table schema evolution.** Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change. | 2–3d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) |

> **TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)**

### Performance Regression CI (BENCH-CI)

> **In plain terms:** v0.16.0 changes core refresh paths (MERGE alternatives,
> aggregate fast-path, append-only bypass, predicate pushdown, buffer
> compaction). Without automated benchmarks in CI, performance regressions
> will slip through silently. This adds a benchmark suite that runs on every
> PR and compares against a committed baseline — any statistically significant
> regression blocks the merge.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| BENCH-CI-1 | **Benchmark harness in CI.** Run `just bench` (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion's `--save-baseline` / `--baseline`. | 2–3d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) |
| BENCH-CI-2 | **Regression gate.** Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers. | 1–2d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) |
| BENCH-CI-3 | **Scenario coverage.** Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales. | 2–3d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) |

> **BENCH-CI subtotal: ~1–2 weeks**

### Auto-Indexing on Stream Table Creation (AUTO-IDX)

> **In plain terms:** pg_ivm automatically creates indexes on GROUP BY columns
> and primary key columns when creating an incrementally maintained view.
> pg_trickle currently requires manual index creation, which is a friction
> point for new users. Auto-indexing creates appropriate indexes at stream
> table creation time — GROUP BY keys, DISTINCT columns, and the
> `__pgt_row_id` covering index for MERGE performance.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~AUTO-IDX-1~~ | ~~**Auto-create indexes on GROUP BY / DISTINCT columns.**~~ ✅ GROUP BY composite index (existing) and DISTINCT composite index (new) auto-created at `create_stream_table()` time. Gated behind `pg_trickle.auto_index` GUC. | — | [src/api.rs](src/api.rs) |
| ~~AUTO-IDX-2~~ | ~~**Covering index on `__pgt_row_id`.**~~ ✅ Already implemented (A-4). Now gated behind `pg_trickle.auto_index` GUC (default `true`). | — | [src/api.rs](src/api.rs) |

> **AUTO-IDX: ✅ Done**

### Quick Wins

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~C2-BUG~~ | ~~**Implement missing `resume_stream_table()`.**~~ ✅ Already existed since v0.2.0 — verified operational. | — | |
| ~~ERR-REF~~ | ~~**Error reference documentation.**~~ ✅ Published as `docs/ERRORS.md` with all 20 variants documented. Cross-linked from FAQ. | — | [docs/ERRORS.md](docs/ERRORS.md) |
| ~~GUC-DEFAULTS~~ | ~~**Review dangerous GUC defaults.**~~ ✅ Defaults kept at `true` (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md. | — | [docs/CONFIGURATION.md](docs/CONFIGURATION.md) |
| ~~BUF-LIMIT~~ | ~~**Change buffer hard growth limit.**~~ ✅ `pg_trickle.max_buffer_rows` GUC added (default: 1M). Forces FULL refresh + truncation when exceeded. | — | [src/config.rs](src/config.rs) · [src/refresh.rs](src/refresh.rs) |

> **Quick wins: ✅ Done**

> **v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins)**
> *Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.*

**Exit criteria:**
- [x] PH-D1: DELETE+INSERT strategy implemented and gated behind `merge_strategy` GUC; correctness verified for INSERT/UPDATE/DELETE deltas
- [x] B-1: Algebraic aggregate fast-path replaces MERGE for `SUM`/`COUNT`/`AVG` GROUP BY queries; `aggregate_fast_path` GUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates; `explain_st()` exposes `aggregate_path`; existing tests pass — ✅ Done in v0.16.0 Phase 8
- [x] A-3-AO: `CREATE STREAM TABLE … APPEND ONLY` accepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event
- [x] B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction
- [x] G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with `template_cache` GUC; invalidation on DDL; `explain_st()` exposes stats
- ~~A3: PG 19 builds and passes full E2E suite~~ — moved to v1.0.0
- [x] C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads; `compact_threshold` GUC respected; no correctness regressions
- [x] TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE
- [x] TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss
- [x] TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions
- [x] TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0)
- [x] TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0)
- [x] TG2-SCHEMA: Source table type changes and column renames tested end-to-end
- [x] BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin
- [x] AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns; `__pgt_row_id` covering index for ≤ 8-column tables; `auto_index` GUC respected
- [x] C2-BUG: `resume_stream_table()` verified operational (present since v0.2.0)
- [x] ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes
- [x] GUC-DEFAULTS: `planner_aggressive` and `cleanup_use_truncate` defaults reviewed; trade-offs documented in CONFIGURATION.md
- [x] BUF-LIMIT: `max_buffer_rows` GUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded
- [x] Extension upgrade path tested (`0.15.0 → 0.16.0`)
- [x] `just check-version-sync` passes


---