> **Plain-language companion:** [v0.16.0.md](v0.16.0.md) ## v0.16.0 — Performance & Refresh Optimization **Status: Released (2026-04-06).** Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic aggregates apply pinpoint updates, and a cross-backend template cache eliminates cold-start latency. Automated benchmark regression gating prevents future performance degradation. **Goal:** Attack the MERGE bottleneck from multiple angles — alternative merge strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering, change buffer compaction, shared-memory template caching — close critical test coverage gaps to validate these new paths. ### MERGE Alternatives & Planner Control (Phase D) > **In plain terms:** MERGE dominates 70–97% of refresh time. This explores > whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE) > is faster for specific patterns — particularly for small deltas against > large stream tables where the MERGE join is the bottleneck. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~PH-D1~~ | ~~**DELETE+INSERT strategy.** For stream tables where delta is <1% of target, replace MERGE with `DELETE WHERE __pgt_row_id IN (delta_deletes)` + `INSERT ... SELECT FROM delta_inserts`. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind `pg_trickle.merge_strategy = 'auto'\|'merge'\|'delete_insert'` GUC.~~ | ~~1–2 wk~~ | ~~[PLAN_PERFORMANCE_PART_9.md §Phase D](plans/performance/PLAN_PERFORMANCE_PART_9.md)~~ | > **MERGE alternatives subtotal: ~1–2 weeks** ### Algebraic Aggregate UPDATE Fast-Path (B-1) > **In plain terms:** The current aggregate delta rule recomputes entire > groups where the GROUP BY key appears in the delta. For a group with 100K > rows where 1 row changed, the aggregate re-scans all 100K rows in that > group. For decomposable aggregates (`SUM`/`COUNT`/`AVG`), a direct > `UPDATE target SET col = col + Δ` replaces the full MERGE join — dropping > aggregate refresh from O(group_size) to O(1) per group. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | B-1 | **Algebraic aggregate UPDATE fast-path.** For `GROUP BY` queries where all aggregates are algebraically invertible (`SUM`/`COUNT`/`AVG`), replace the MERGE with a direct `UPDATE target SET col = col + Δ WHERE group_key = ?` for existing groups, plus `INSERT` for newly-appearing groups and `DELETE` for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding `__pgt_aux_count` / `__pgt_aux_sum` auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (`MIN`, `MAX`, `STRING_AGG`, etc.). Gate behind `pg_trickle.aggregate_fast_path` GUC (default `true`). Expected impact: **5–20× apply-time reduction** for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time. | 4–6 wk | [plans/performance/PLAN_NEW_STUFF.md §B-1](plans/performance/PLAN_NEW_STUFF.md) · [plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4](plans/sql/PLAN_TRANSACTIONAL_IVM.md) | > **B-1 subtotal: ~4–6 weeks** ### Append-Only Stream Tables — MERGE Bypass (A-3-AO) > **In plain terms:** When a stream table's sources are insert-only (e.g. > event logs, append-only tables where CDC never sees DELETE/UPDATE), the > MERGE is pure overhead — every delta row is an INSERT, never a match. > Bypassing MERGE entirely with a plain `INSERT INTO st SELECT ... FROM delta` > removes the join against the target table, takes only `RowExclusiveLock`, > and is the single highest-payoff optimization for event-sourced architectures. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~A-3-AO~~ | ~~**Append-only stream table fast path.** Expose an explicit `CREATE STREAM TABLE … APPEND ONLY` declaration. When set, refresh uses `INSERT INTO st SELECT ... FROM delta` instead of MERGE — no target-table join, `RowExclusiveLock` only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas.~~ | ~~1–2 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §A-3](plans/performance/PLAN_NEW_STUFF.md)~~ | > **A-3-AO subtotal: ~1–2 weeks** ### Delta Predicate Pushdown (B-2) > **In plain terms:** For a query like `SELECT ... FROM orders WHERE status = > 'shipped'`, if a CDC change row has `status = 'pending'`, the delta > processes it through scan → filter → discard. All the scan and join work > is wasted. Pushing the WHERE predicate down into the change buffer scan > eliminates irrelevant rows before any join processing begins — a 5–10× > reduction in delta row volume for selective queries. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~B-2~~ | ~~**Delta predicate pushdown.** During OpTree construction, identify `Filter` nodes whose predicates reference only columns from a single source table. Inject these predicates into the `delta_scan` CTE as additional WHERE clauses (including `OR old_col = 'value'` for DELETE correctness). Expected impact: **5–10× delta row reduction** for queries with < 10% selectivity.~~ | ~~2–3 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §B-2](plans/performance/PLAN_NEW_STUFF.md)~~ | > **B-2 subtotal: ~2–3 weeks** ### Shared-Memory Template Caching (G14-SHC) > **In plain terms:** Every new database connection that triggers a refresh > pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With > PgBouncer in transaction mode, this happens on every single refresh cycle. > Shared-memory caching stores compiled templates in PostgreSQL DSM so they > survive across connections — eliminating the cold-start entirely for > steady-state workloads. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | G14-SHC | **Shared-memory template caching (implementation).** Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0. | ~2–3wk | [plans/performance/REPORT_OVERALL_STATUS.md §14](plans/performance/REPORT_OVERALL_STATUS.md) | > **G14-SHC subtotal: ~2–3 weeks** ### ~~PostgreSQL 19 Forward-Compatibility (A3)~~ — Moved to v1.0.0 > PG 19 beta not available in time. Items A3-1 through A3-4 deferred > to v1.0.0 milestone. ### Change Buffer Compaction (C-4) > **In plain terms:** A high-churn source table can accumulate thousands of > changes to the same row between refresh cycles — an INSERT followed by 10 > UPDATEs followed by a DELETE is really just "nothing happened." Compaction > merges multiple changes to the same row ID into a single net change before > the delta query runs, reducing change buffer size by 50–90% for high-churn > tables. This directly reduces work for every downstream path (MERGE, > DELETE+INSERT, append-only INSERT, predicate pushdown). | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~C-4~~ | ~~**Change buffer compaction.** Before delta-query execution, merge multiple changes to the same `__pgt_row_id` into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding `pg_trickle.compact_threshold` rows (default: 100K). Expected impact: **50–90% reduction in change buffer size** for high-churn tables.~~ | ~~2–3 wk~~ | ~~[plans/performance/PLAN_NEW_STUFF.md §C-4](plans/performance/PLAN_NEW_STUFF.md)~~ | > **C-4 subtotal: ~2–3 weeks** ### Test Coverage Hardening (TG2) > **In plain terms:** The performance optimizations in this release change > core refresh paths (MERGE alternatives, aggregate fast-path, append-only > bypass, predicate pushdown). Before and alongside these changes, critical > test coverage gaps need closing — particularly around operators and > scenarios where bugs could hide silently. These gaps were identified in > the TESTING_GAPS_2 audit. #### High-Priority Gaps | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~TG2-WIN~~ | ~~**Window function DVM execution tests.** ~5 unit tests exist but 0 DVM execution tests. Add execution-level tests for ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD delta behavior across INSERT/UPDATE/DELETE cycles.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ | | ~~TG2-JOIN~~ | ~~**Join multi-cycle UPDATE/DELETE correctness.** E2E join tests are INSERT-only; no UPDATE/DELETE differential cycles. Add systematic multi-cycle coverage for INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation. Risk: silent data corruption in production workloads.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ | | ~~TG2-EQUIV~~ | ~~**Differential ≡ Full equivalence validation.** Only CTEs validated; joins and aggregates lack equivalence proof. Add a test harness that runs every defining query in both DIFFERENTIAL and FULL mode and asserts identical results. Critical for trusting the new optimization paths.~~ | ~~3–5d~~ | ~~[TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md)~~ | #### Medium-Priority Gaps | Item | Description | Effort | Ref | |------|-------------|--------|-----| | TG2-MERGE | **refresh.rs MERGE template unit tests.** Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level. | 2–3d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) | | TG2-CANCEL | **Timeout/cancellation during refresh.** Zero tests for `statement_timeout`, `pg_cancel_backend()` during active refresh. Risk: silent failures or resource leaks under production load. | 1–2d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) | | TG2-SCHEMA | **Source table schema evolution.** Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change. | 2–3d | [TESTING_GAPS_2.md](plans/testing/TESTING_GAPS_2.md) | > **TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)** ### Performance Regression CI (BENCH-CI) > **In plain terms:** v0.16.0 changes core refresh paths (MERGE alternatives, > aggregate fast-path, append-only bypass, predicate pushdown, buffer > compaction). Without automated benchmarks in CI, performance regressions > will slip through silently. This adds a benchmark suite that runs on every > PR and compares against a committed baseline — any statistically significant > regression blocks the merge. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | BENCH-CI-1 | **Benchmark harness in CI.** Run `just bench` (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion's `--save-baseline` / `--baseline`. | 2–3d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) | | BENCH-CI-2 | **Regression gate.** Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers. | 1–2d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) | | BENCH-CI-3 | **Scenario coverage.** Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales. | 2–3d | [plans/performance/PLAN_PERFORMANCE_PART_9.md §I](plans/performance/PLAN_PERFORMANCE_PART_9.md) | > **BENCH-CI subtotal: ~1–2 weeks** ### Auto-Indexing on Stream Table Creation (AUTO-IDX) > **In plain terms:** pg_ivm automatically creates indexes on GROUP BY columns > and primary key columns when creating an incrementally maintained view. > pg_trickle currently requires manual index creation, which is a friction > point for new users. Auto-indexing creates appropriate indexes at stream > table creation time — GROUP BY keys, DISTINCT columns, and the > `__pgt_row_id` covering index for MERGE performance. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~AUTO-IDX-1~~ | ~~**Auto-create indexes on GROUP BY / DISTINCT columns.**~~ ✅ GROUP BY composite index (existing) and DISTINCT composite index (new) auto-created at `create_stream_table()` time. Gated behind `pg_trickle.auto_index` GUC. | — | [src/api.rs](src/api.rs) | | ~~AUTO-IDX-2~~ | ~~**Covering index on `__pgt_row_id`.**~~ ✅ Already implemented (A-4). Now gated behind `pg_trickle.auto_index` GUC (default `true`). | — | [src/api.rs](src/api.rs) | > **AUTO-IDX: ✅ Done** ### Quick Wins | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~C2-BUG~~ | ~~**Implement missing `resume_stream_table()`.**~~ ✅ Already existed since v0.2.0 — verified operational. | — | | | ~~ERR-REF~~ | ~~**Error reference documentation.**~~ ✅ Published as `docs/ERRORS.md` with all 20 variants documented. Cross-linked from FAQ. | — | [docs/ERRORS.md](docs/ERRORS.md) | | ~~GUC-DEFAULTS~~ | ~~**Review dangerous GUC defaults.**~~ ✅ Defaults kept at `true` (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md. | — | [docs/CONFIGURATION.md](docs/CONFIGURATION.md) | | ~~BUF-LIMIT~~ | ~~**Change buffer hard growth limit.**~~ ✅ `pg_trickle.max_buffer_rows` GUC added (default: 1M). Forces FULL refresh + truncation when exceeded. | — | [src/config.rs](src/config.rs) · [src/refresh.rs](src/refresh.rs) | > **Quick wins: ✅ Done** > **v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins)** > *Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.* **Exit criteria:** - [x] PH-D1: DELETE+INSERT strategy implemented and gated behind `merge_strategy` GUC; correctness verified for INSERT/UPDATE/DELETE deltas - [x] B-1: Algebraic aggregate fast-path replaces MERGE for `SUM`/`COUNT`/`AVG` GROUP BY queries; `aggregate_fast_path` GUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates; `explain_st()` exposes `aggregate_path`; existing tests pass — ✅ Done in v0.16.0 Phase 8 - [x] A-3-AO: `CREATE STREAM TABLE … APPEND ONLY` accepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event - [x] B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction - [x] G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with `template_cache` GUC; invalidation on DDL; `explain_st()` exposes stats - ~~A3: PG 19 builds and passes full E2E suite~~ — moved to v1.0.0 - [x] C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads; `compact_threshold` GUC respected; no correctness regressions - [x] TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE - [x] TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss - [x] TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions - [x] TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0) - [x] TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0) - [x] TG2-SCHEMA: Source table type changes and column renames tested end-to-end - [x] BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin - [x] AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns; `__pgt_row_id` covering index for ≤ 8-column tables; `auto_index` GUC respected - [x] C2-BUG: `resume_stream_table()` verified operational (present since v0.2.0) - [x] ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes - [x] GUC-DEFAULTS: `planner_aggressive` and `cleanup_use_truncate` defaults reviewed; trade-offs documented in CONFIGURATION.md - [x] BUF-LIMIT: `max_buffer_rows` GUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded - [x] Extension upgrade path tested (`0.15.0 → 0.16.0`) - [x] `just check-version-sync` passes ---