> **Plain-language companion:** [v0.17.0.md](v0.17.0.md) ## v0.17.0 — Query Intelligence & Stability **Status: Released (2026-04-08).** **Goal:** Make the refresh engine smarter, prove correctness through automated fuzzing, harden for scale, and prepare for adoption. Cost-based strategy selection replaces the fixed DIFF/FULL threshold, columnar change tracking skips irrelevant columns in wide-table UPDATEs, SQLancer integration provides automated semantic proving, incremental DAG rebuild supports 1000+ stream table deployments, and unsafe block reduction continues the safety hardening toward 1.0. On the adoption side: `api.rs` modularization improves code maintainability, a pg_ivm migration guide targets the largest potential adopter audience, a failure mode runbook equips production teams, and a Docker Compose playground provides a 60-second tryout experience. ### Cost-Based Refresh Strategy Selection (B-4) > **In plain terms:** The current adaptive FULL/DIFFERENTIAL threshold is a > fixed ratio (`differential_max_change_ratio` default 0.5). A join-heavy > query may be better off with FULL at 5% change rate, while a scan-only > query benefits from DIFFERENTIAL up to 80%. This replaces the fixed > threshold with a cost model trained on each stream table's own refresh > history — selecting the cheapest strategy per cycle automatically. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | B-4 | **Cost-based refresh strategy selection.** Collect per-ST statistics (`delta_row_count`, `merge_duration_ms`, `full_refresh_duration_ms`, `query_complexity_class`) from `pgt_refresh_history`. Fit a simple linear cost model. Before each refresh, compare `estimated_diff_cost(Δ)` vs `estimated_full_cost × safety_margin` and select the cheaper path. Cold-start heuristic (< 10 refreshes) falls back to existing fixed threshold. Gate behind `pg_trickle.refresh_strategy = 'auto'\|'differential'\|'full'` GUC. | 2–3 wk | [plans/performance/PLAN_NEW_STUFF.md §B-4](plans/performance/PLAN_NEW_STUFF.md) | > **B-4 subtotal: ~2–3 weeks** ### Columnar Change Tracking (A-2-COL) > **In plain terms:** When a source table UPDATE changes only 1 of 50 columns, > the current CDC captures the entire row (old + new) and the delta query > processes all columns. If the changed column is not referenced by the stream > table's defining query, the entire refresh is wasted work. Columnar change > tracking adds a per-column bitmask to CDC events so the delta query can skip > irrelevant rows at scan time — a 50–90% reduction in delta volume for > wide-table OLTP workloads. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | A-2-COL-1 | **CDC trigger bitmask.** Compute `changed_columns` bitmask (`old.col IS DISTINCT FROM new.col`) in the CDC trigger; store as `int8` or `bit(n)` alongside the change row. | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) | | A-2-COL-2 | **Delta-scan column filtering.** At delta-query build time, consult the bitmask: skip rows where no referenced column changed; use lightweight UPDATE-only path when only projected columns changed (no join keys, no filter predicates, no aggregate keys). | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) | | A-2-COL-3 | **Aggregate correction optimization.** For aggregates where only the aggregated value column changed (not GROUP BY key), emit a single correction row instead of delete-old + insert-new. | 3–5d | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) | > **A-2-COL subtotal: ~3–4 weeks** ### Transactional IVM Phase 4 Remaining (A2) > **In plain terms:** IMMEDIATE mode (same-transaction refresh) shipped in > v0.2.0 using SQL-level statement triggers. Phase 4 completes the transition > to lower-overhead C-level triggers and ENR-based transition tables — sharing > the transition tuplestore directly between the trigger and the refresh engine > instead of copying through a temp table. Also adds prepared statement reuse > to eliminate repeated parse/plan overhead for the delta query. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~A2-ENR~~ | ~~**ENR-based transition tables.**~~ 🚫 **Deferred post-1.0** — requires raw `pg_sys` ENR tuplestore FFI not surfaced by pgrx; carries memory-corruption and `pg_upgrade` compatibility risk. Revisit after 1.0 stabilisation. | ~~12–18h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 | | ~~A2-CTR~~ | ~~**C-level triggers.**~~ 🚫 **Deferred post-1.0** — requires raw `CreateTrigger()` FFI not surfaced by pgrx; carries memory-corruption and `pg_upgrade` compatibility risk. Revisit after 1.0 stabilisation. | ~~12–18h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 | | ~~A2-PS~~ | ~~**Prepared statement reuse.**~~ ✅ **Already shipped** — `pg_trickle.use_prepared_statements` GUC (default `true`) implemented and wired in `refresh.rs`; parse/plan overhead eliminated on steady-state workloads. | ~~8–12h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 | > **A2 subtotal: 0h remaining** (A2-PS shipped; A2-ENR + A2-CTR deferred post-1.0) ### `ROWS FROM()` Support (A8) > **In plain terms:** `ROWS FROM()` with multiple set-returning functions > is a rarely-used SQL feature, but supporting it closes a coverage gap > in the parser and DVM pipeline. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | A8 | **`ROWS FROM()` with multiple SRF functions.** Parser + DVM support for `ROWS FROM(generate_series(...), unnest(...))` in defining queries. Very low demand. | ~1–2d | [PLAN_TRANSACTIONAL_IVM_PART_2.md](plans/sql/PLAN_TRANSACTIONAL_IVM_PART_2.md) Task 2.3 | > **A8 subtotal: ~1–2 days** ### SQLancer Fuzzing Integration (SQLANCER) > **In plain terms:** pg_trickle's tests were written by the pg_trickle team, > which means they share the same assumptions as the code. SQLancer is an > automated database testing tool that generates random SQL queries and checks > whether the results are correct — it has found hundreds of bugs in > PostgreSQL, SQLite, CockroachDB, and TiDB. Integrating SQLancer gives > pg_trickle a crash-test oracle (does the parser panic on fuzzed input?), > an equivalence oracle (does DIFFERENTIAL mode produce the same answer as > FULL?), and stateful DML fuzzing (do random INSERT/UPDATE/DELETE sequences > corrupt stream table data?). This is the single highest-value testing > investment for finding unknown correctness bugs. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~SQLANCER-1~~ | ~~**Fuzzing environment.**~~ ✅ **Done** — Docker-based harness (`just sqlancer`), Rust LCG query generator, `SQLANCER_CASES`/`SQLANCER_SEED` controls, `weekly-sqlancer` CI job. | ~~2–3d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §1 | | ~~SQLANCER-2~~ | ~~**Crash-test oracle.**~~ ✅ **Done** — `test_sqlancer_crash_oracle` / `run_crash_oracle()` verifies zero backend crashes over 200–2000 fuzzed queries. | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §2 | | ~~SQLANCER-3~~ | ~~**Equivalence oracle.**~~ ✅ **Done** — `test_sqlancer_diff_vs_full_oracle` / `run_diff_vs_full_oracle()` creates DIFFERENTIAL + FULL stream tables, applies 4 DML mutations, and asserts count parity. Integrated into `test_sqlancer_ci_combined`. | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §3 | | ~~SQLANCER-4~~ | ~~**Stateful DML fuzzing.**~~ ✅ **Done** — `test_sqlancer_stateful_dml` / `run_stateful_dml_fuzzing()` runs `SQLANCER_MUTATIONS` (default 100, nightly 10 000) random INSERT/UPDATE/DELETE mutations with checkpoints every 50. CI: `weekly-sqlancer-stateful` job (`SQLANCER_MUTATIONS=10000`). | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §4 | > **SQLANCER subtotal: 0 remaining** (all four items shipped in v0.17.0) ### Incremental DAG Rebuild (C-2) > **In plain terms:** When any DDL change occurs (e.g. `ALTER STREAM TABLE`, > `DROP STREAM TABLE`), the entire dependency graph is rebuilt from scratch > by querying `pgt_dependencies`. For 1000+ stream tables this becomes > expensive — O(V+E) SPI queries. Incremental DAG maintenance records which > specific stream table was affected and only re-sorts the affected subgraph, > reducing the scheduler latency spike from ~50ms to ~1ms at scale. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | C-2-1 | **Delta-based rebuild.** Record affected `pgt_id` in a bounded ring buffer in shared memory alongside `DAG_REBUILD_SIGNAL`. On overflow, fall back to full rebuild. | 1 wk | [plans/performance/PLAN_NEW_STUFF.md §C-2](plans/performance/PLAN_NEW_STUFF.md) | | C-2-2 | **Incremental topological sort.** Add/remove only affected edges and vertices; re-run topological sort on the affected subgraph only. Cache the sorted schedule in shared memory. | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §C-2](plans/performance/PLAN_NEW_STUFF.md) | > **C-2 subtotal: ~2–3 weeks** ### Unsafe Block Reduction — Phase 6 (UNSAFE-R1/R2) > **In plain terms:** pg_trickle achieved a 51% reduction in `unsafe` blocks > (from ~1,300 to 641) in earlier releases. The remaining blocks are > concentrated in well-documented field-accessor macros and standalone > `is_a` type checks. Converting these to safe wrappers removes another > 150–250 unsafe blocks with minimal risk — a meaningful safety improvement > before 1.0. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | UNSAFE-R1 | **Safe field-accessor macros.** Replace `unsafe { (*node).field }` patterns with safe accessor functions. Estimated reduction: ~100–150 unsafe blocks. | 2–4h | [PLAN_REDUCED_UNSAFE.md §R1](plans/safety/PLAN_REDUCED_UNSAFE.md) | | UNSAFE-R2 | **Safe `is_a` checks.** Convert standalone `unsafe { is_a(node, T_Foo) }` calls to safe wrapper functions. Estimated reduction: ~50–99 unsafe blocks. | 2–4h | [PLAN_REDUCED_UNSAFE.md §R2](plans/safety/PLAN_REDUCED_UNSAFE.md) | > **UNSAFE-R1/R2 subtotal: ~4–8 hours** ### `api.rs` Modularization (API-MOD) > **In plain terms:** `api.rs` is 9,413 lines — the largest file in the > codebase. It contains stream table CRUD, ALTER QUERY, CDC management, > bulk operations, diagnostics, and monitoring functions all in one file. > The same treatment that `parser.rs` received in v0.15.0 (split from 21K > lines into 5 sub-modules) is needed here. Zero behavior change — purely > structural. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | API-MOD | **Split `src/api.rs` into sub-modules.** Proposed split: `api/create.rs` (create/drop/alter), `api/refresh.rs` (refresh entry points), `api/cdc.rs` (CDC management), `api/diagnostics.rs` (explain_st, health_check), `api/bulk.rs` (bulk_create), `api/mod.rs` (re-exports). Zero behavior change. | 1–2 wk | — | > **API-MOD subtotal: ~1–2 weeks** ### pg_ivm Migration Guide (MIG-IVM) > **In plain terms:** pg_ivm is the incumbent IVM extension with 1,400+ > GitHub stars and 4 years of production use. Many potential pg_trickle > adopters are currently using pg_ivm. A step-by-step migration guide — > mapping pg_ivm concepts to pg_trickle equivalents, with concrete SQL > examples — removes the biggest adoption friction for this audience. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | MIG-IVM | **pg_ivm → pg_trickle migration guide.** Map: `create_immv()` → `create_stream_table()`; `refresh_immv()` → `refresh_stream_table()`; IMMEDIATE mode equivalence; aggregate coverage differences (5 vs 60+); GUC mapping; worked example migrating a real pg_ivm deployment. Publish as `docs/tutorials/MIGRATING_FROM_PG_IVM.md`. | 2–3d | [docs/research/PG_IVM_COMPARISON.md](docs/research/PG_IVM_COMPARISON.md) | > **MIG-IVM subtotal: ~2–3 days** ### Failure Mode Runbook (RUNBOOK) > **In plain terms:** Production teams need to know what happens when things > go wrong — and what to do about it. This documents every failure mode > pg_trickle can encounter (scheduler crash, WAL slot lag, OOM during > refresh, disk full, replication slot conflict, stuck watermarks, circular > convergence failure) with symptoms, diagnosis steps, and resolution > procedures. Essential for on-call engineers. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | RUNBOOK | **Failure mode runbook.** Document: scheduler crash recovery, WAL decoder failures, OOM during refresh, disk-full behavior, replication slot conflicts, stuck watermarks, circular convergence timeout, CDC trigger failures, SUSPENDED state recovery, lock contention diagnosis. Include `health_check()` output interpretation and `explain_st()` troubleshooting. Publish as `docs/TROUBLESHOOTING.md`. | 3–5d | [docs/PRE_DEPLOYMENT.md](docs/PRE_DEPLOYMENT.md) | > **RUNBOOK subtotal: ~3–5 days** ### Docker Quickstart Playground (PLAYGROUND) > **In plain terms:** The fastest way to evaluate any database extension is > to run it locally in 60 seconds. A `docker-compose.yml` with PostgreSQL + > pg_trickle pre-installed, sample data (e.g. the org-chart from > GETTING_STARTED.md), and a Jupyter notebook or pgAdmin web UI gives > potential users a zero-friction tryout experience. This is the single > most impactful thing for driving initial adoption. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | PLAYGROUND | **Docker Compose quickstart.** `docker-compose.yml` with: PG 18 + pg_trickle, seed SQL script (org-chart example from GETTING_STARTED.md + TPC-H SF=0.01), pgAdmin web UI (optional). Single `docker compose up` command. README with guided walkthrough. | 2–3d | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) | > **PLAYGROUND subtotal: ~2–3 days** ### Documentation Polish (DOC-POLISH) > **In plain terms:** The existing documentation is comprehensive and > technically excellent, but it's optimized for users already familiar with > IVM and PostgreSQL internals. These items restructure the docs for a > better "first hour" experience — simpler getting-started examples, a > refresh mode decision guide, a condensed new-user FAQ, and a setup > verification checklist. The goal is to reduce cognitive overload for new > users without losing the depth that experienced users need. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | DOC-HELLO | **Simplified "Hello Stream Table" in GETTING_STARTED.** Add a Chapter 0 with a single-table, single-aggregate stream table (e.g. `SELECT department, count(*) FROM employees GROUP BY department`). Create it, insert a row, verify the refresh. Build confidence before the multi-table org-chart example. | 2–4h | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) | | DOC-DECIDE | **Refresh mode decision guide.** Flowchart: "Need transactional consistency? → IMMEDIATE. Volatile functions? → FULL. Otherwise → AUTO (DIFFERENTIAL with FULL fallback)." Include when-to-use guidance for each mode with concrete examples. Publish as a section in GETTING_STARTED or as a standalone tutorial. | 2–4h | [docs/tutorials/tuning-refresh-mode.md](docs/tutorials/tuning-refresh-mode.md) | | DOC-FAQ-NEW | **New User FAQ (top 15 questions).** Extract the 15 most common new-user questions from the 3,000-line FAQ into a prominent "New User FAQ" section at the top. Keyword-rich headings for searchability. Link to deep FAQ for details. | 2–3h | [docs/FAQ.md](docs/FAQ.md) | | DOC-VERIFY | **Post-install verification checklist.** SQL script that verifies: extension loaded, shared_preload_libraries configured, GUCs set, CDC triggers installable, first stream table creates and refreshes successfully. Runnable as `psql -f verify_install.sql`. | 2–4h | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) | | DOC-STUBS | **Fill or remove research stubs.** `PG_IVM_COMPARISON.md` (60 bytes) and `CUSTOM_SQL_SYNTAX.md` (57 bytes) are empty stubs. Either flesh them out (PG_IVM_COMPARISON can draw from the existing comparison data) or remove from SUMMARY.md. | 2–4h | [docs/research/](docs/research/) | > **DOC-POLISH subtotal: ~2–3 days** > **v0.17.0 total: ~2–3 weeks (cost-based strategy) + ~3–4 weeks (columnar tracking) + ~32–48 hours (TIVM Phase 4) + ~1–2 days (ROWS FROM) + ~2–3 weeks (SQLancer) + ~2–3 weeks (incremental DAG) + ~4–8 hours (unsafe reduction) + ~1–2 weeks (api.rs modularization) + ~2–3 days (pg_ivm migration) + ~3–5 days (failure runbook) + ~2–3 days (Docker playground) + ~2–3 days (doc polish)** **Exit criteria:** - [x] B-4: Cost-based strategy selector trained on per-ST history; cold-start fallback to fixed threshold; `QueryComplexityClass` cost model (scan/filter/aggregate/join/join_agg); `refresh_strategy` + `cost_model_safety_margin` GUCs; pre-refresh predictive comparison; 10 unit tests - [x] A-2-COL: CDC trigger emits `changed_cols` VARBIT bitmask (COL-1); delta-scan filters irrelevant rows via `changed_cols & mask` (COL-2); aggregate value-only correction 'V' path halves row volume (COL-3) - ~~[ ] A2-ENR~~: 🚫 Deferred post-1.0 — requires raw `pg_sys` ENR tuplestore FFI (memory-corruption risk); revisit after 1.0 stabilisation - ~~[ ] A2-CTR~~: 🚫 Deferred post-1.0 — requires raw `CreateTrigger()` C FFI (memory-corruption risk); revisit after 1.0 stabilisation - [x] A2-PS: ✅ Already shipped — `pg_trickle.use_prepared_statements` GUC (default `true`) wired in `refresh.rs`; parse/plan overhead eliminated on steady-state workloads - [x] A8: `ROWS FROM()` with multiple SRFs accepted in defining queries; E2E tests cover INSERT/UPDATE/DELETE propagation - [x] SQLANCER: ✅ SQLANCER-1/2 crash + equivalence oracles shipped in v0.12.0; SQLANCER-3 diff-vs-full oracle and SQLANCER-4 stateful DML soak (10K mutations) added in v0.17.0; `weekly-sqlancer-stateful` CI job wired - [x] C-2: Incremental DAG rebuild reduces DDL-triggered latency spike to < 5ms at 100+ STs; ring buffer overflow falls back to full rebuild; no correctness regressions - [x] UNSAFE-R1/R2: Unsafe block count reduced by 249 (690→441 in parser); `is_node_type!` and `pg_deref!` macros; all 1,700 unit tests pass - [x] API-MOD: `api.rs` split into 3 sub-modules (mod.rs 5,624 + diagnostics.rs 1,377 + helpers.rs 2,461); zero behavior change; all 1,700 unit tests pass - [x] MIG-IVM: `docs/tutorials/MIGRATING_FROM_PG_IVM.md` published with step-by-step migration, API mapping, behavioral differences, SQL upgrade examples, and verification checklist - [x] RUNBOOK: `docs/TROUBLESHOOTING.md` covers 13 failure scenarios (scheduler, SUSPENDED, CDC triggers, WAL slots, INITIALIZING, buffer growth, lock contention, OOM, disk full, circular convergence, schema changes, worker pool, fuse) with symptoms, diagnosis, and resolution - [x] PLAYGROUND: `playground/` with docker-compose.yml, seed.sql (3 base tables, 5 stream tables), and README walkthrough - [x] DOC-HELLO: Chapter 1 "Hello World" in GETTING_STARTED already provides the single-table aggregate example (products/category_summary) - [x] DOC-DECIDE: Refresh mode decision guide already published as `tutorials/tuning-refresh-mode.md` with `recommend_refresh_mode()` and signal breakdown - [x] DOC-FAQ-NEW: New User FAQ section with 15 keyword-rich entries added at top of FAQ.md - [x] DOC-VERIFY: `scripts/verify_install.sql` checks shared_preload_libraries, extension, scheduler, GUCs, and runs end-to-end stream table cycle - [x] DOC-STUBS: Research stubs already use `{{#include}}` directives pointing to substantial content (923 + 1232 lines) - [x] Extension upgrade path tested (`0.16.0 → 0.17.0`) ---