> **Plain-language companion:** [v0.17.0.md](v0.17.0.md)

## v0.17.0 — Query Intelligence & Stability

**Status: Released (2026-04-08).**

**Goal:** Make the refresh engine smarter, prove correctness through automated
fuzzing, harden for scale, and prepare for adoption. Cost-based strategy
selection replaces the fixed DIFF/FULL threshold, columnar change tracking
skips irrelevant columns in wide-table UPDATEs, SQLancer integration provides
automated semantic proving, incremental DAG rebuild supports 1000+ stream table
deployments, and unsafe block reduction continues the safety hardening toward
1.0. On the adoption side: `api.rs` modularization improves code maintainability,
a pg_ivm migration guide targets the largest potential adopter audience, a
failure mode runbook equips production teams, and a Docker Compose playground
provides a 60-second tryout experience.

### Cost-Based Refresh Strategy Selection (B-4)

> **In plain terms:** The current adaptive FULL/DIFFERENTIAL threshold is a
> fixed ratio (`differential_max_change_ratio` default 0.5). A join-heavy
> query may be better off with FULL at 5% change rate, while a scan-only
> query benefits from DIFFERENTIAL up to 80%. This replaces the fixed
> threshold with a cost model trained on each stream table's own refresh
> history — selecting the cheapest strategy per cycle automatically.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| B-4 | **Cost-based refresh strategy selection.** Collect per-ST statistics (`delta_row_count`, `merge_duration_ms`, `full_refresh_duration_ms`, `query_complexity_class`) from `pgt_refresh_history`. Fit a simple linear cost model. Before each refresh, compare `estimated_diff_cost(Δ)` vs `estimated_full_cost × safety_margin` and select the cheaper path. Cold-start heuristic (< 10 refreshes) falls back to existing fixed threshold. Gate behind `pg_trickle.refresh_strategy = 'auto'\|'differential'\|'full'` GUC. | 2–3 wk | [plans/performance/PLAN_NEW_STUFF.md §B-4](plans/performance/PLAN_NEW_STUFF.md) |

> **B-4 subtotal: ~2–3 weeks**

### Columnar Change Tracking (A-2-COL)

> **In plain terms:** When a source table UPDATE changes only 1 of 50 columns,
> the current CDC captures the entire row (old + new) and the delta query
> processes all columns. If the changed column is not referenced by the stream
> table's defining query, the entire refresh is wasted work. Columnar change
> tracking adds a per-column bitmask to CDC events so the delta query can skip
> irrelevant rows at scan time — a 50–90% reduction in delta volume for
> wide-table OLTP workloads.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| A-2-COL-1 | **CDC trigger bitmask.** Compute `changed_columns` bitmask (`old.col IS DISTINCT FROM new.col`) in the CDC trigger; store as `int8` or `bit(n)` alongside the change row. | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) |
| A-2-COL-2 | **Delta-scan column filtering.** At delta-query build time, consult the bitmask: skip rows where no referenced column changed; use lightweight UPDATE-only path when only projected columns changed (no join keys, no filter predicates, no aggregate keys). | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) |
| A-2-COL-3 | **Aggregate correction optimization.** For aggregates where only the aggregated value column changed (not GROUP BY key), emit a single correction row instead of delete-old + insert-new. | 3–5d | [plans/performance/PLAN_NEW_STUFF.md §A-2](plans/performance/PLAN_NEW_STUFF.md) |

> **A-2-COL subtotal: ~3–4 weeks**

### Transactional IVM Phase 4 Remaining (A2)

> **In plain terms:** IMMEDIATE mode (same-transaction refresh) shipped in
> v0.2.0 using SQL-level statement triggers. Phase 4 completes the transition
> to lower-overhead C-level triggers and ENR-based transition tables — sharing
> the transition tuplestore directly between the trigger and the refresh engine
> instead of copying through a temp table. Also adds prepared statement reuse
> to eliminate repeated parse/plan overhead for the delta query.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~A2-ENR~~ | ~~**ENR-based transition tables.**~~ 🚫 **Deferred post-1.0** — requires raw `pg_sys` ENR tuplestore FFI not surfaced by pgrx; carries memory-corruption and `pg_upgrade` compatibility risk. Revisit after 1.0 stabilisation. | ~~12–18h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 |
| ~~A2-CTR~~ | ~~**C-level triggers.**~~ 🚫 **Deferred post-1.0** — requires raw `CreateTrigger()` FFI not surfaced by pgrx; carries memory-corruption and `pg_upgrade` compatibility risk. Revisit after 1.0 stabilisation. | ~~12–18h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 |
| ~~A2-PS~~ | ~~**Prepared statement reuse.**~~ ✅ **Already shipped** — `pg_trickle.use_prepared_statements` GUC (default `true`) implemented and wired in `refresh.rs`; parse/plan overhead eliminated on steady-state workloads. | ~~8–12h~~ | [PLAN_TRANSACTIONAL_IVM.md](plans/sql/PLAN_TRANSACTIONAL_IVM.md) §Phase 4 |

> **A2 subtotal: 0h remaining** (A2-PS shipped; A2-ENR + A2-CTR deferred post-1.0)

### `ROWS FROM()` Support (A8)

> **In plain terms:** `ROWS FROM()` with multiple set-returning functions
> is a rarely-used SQL feature, but supporting it closes a coverage gap
> in the parser and DVM pipeline.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| A8 | **`ROWS FROM()` with multiple SRF functions.** Parser + DVM support for `ROWS FROM(generate_series(...), unnest(...))` in defining queries. Very low demand. | ~1–2d | [PLAN_TRANSACTIONAL_IVM_PART_2.md](plans/sql/PLAN_TRANSACTIONAL_IVM_PART_2.md) Task 2.3 |

> **A8 subtotal: ~1–2 days**

### SQLancer Fuzzing Integration (SQLANCER)

> **In plain terms:** pg_trickle's tests were written by the pg_trickle team,
> which means they share the same assumptions as the code. SQLancer is an
> automated database testing tool that generates random SQL queries and checks
> whether the results are correct — it has found hundreds of bugs in
> PostgreSQL, SQLite, CockroachDB, and TiDB. Integrating SQLancer gives
> pg_trickle a crash-test oracle (does the parser panic on fuzzed input?),
> an equivalence oracle (does DIFFERENTIAL mode produce the same answer as
> FULL?), and stateful DML fuzzing (do random INSERT/UPDATE/DELETE sequences
> corrupt stream table data?). This is the single highest-value testing
> investment for finding unknown correctness bugs.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| ~~SQLANCER-1~~ | ~~**Fuzzing environment.**~~ ✅ **Done** — Docker-based harness (`just sqlancer`), Rust LCG query generator, `SQLANCER_CASES`/`SQLANCER_SEED` controls, `weekly-sqlancer` CI job. | ~~2–3d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §1 |
| ~~SQLANCER-2~~ | ~~**Crash-test oracle.**~~ ✅ **Done** — `test_sqlancer_crash_oracle` / `run_crash_oracle()` verifies zero backend crashes over 200–2000 fuzzed queries. | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §2 |
| ~~SQLANCER-3~~ | ~~**Equivalence oracle.**~~ ✅ **Done** — `test_sqlancer_diff_vs_full_oracle` / `run_diff_vs_full_oracle()` creates DIFFERENTIAL + FULL stream tables, applies 4 DML mutations, and asserts count parity. Integrated into `test_sqlancer_ci_combined`. | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §3 |
| ~~SQLANCER-4~~ | ~~**Stateful DML fuzzing.**~~ ✅ **Done** — `test_sqlancer_stateful_dml` / `run_stateful_dml_fuzzing()` runs `SQLANCER_MUTATIONS` (default 100, nightly 10 000) random INSERT/UPDATE/DELETE mutations with checkpoints every 50. CI: `weekly-sqlancer-stateful` job (`SQLANCER_MUTATIONS=10000`). | ~~3–5d~~ | [PLAN_SQLANCER.md](plans/testing/PLAN_SQLANCER.md) §4 |

> **SQLANCER subtotal: 0 remaining** (all four items shipped in v0.17.0)

### Incremental DAG Rebuild (C-2)

> **In plain terms:** When any DDL change occurs (e.g. `ALTER STREAM TABLE`,
> `DROP STREAM TABLE`), the entire dependency graph is rebuilt from scratch
> by querying `pgt_dependencies`. For 1000+ stream tables this becomes
> expensive — O(V+E) SPI queries. Incremental DAG maintenance records which
> specific stream table was affected and only re-sorts the affected subgraph,
> reducing the scheduler latency spike from ~50ms to ~1ms at scale.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| C-2-1 | **Delta-based rebuild.** Record affected `pgt_id` in a bounded ring buffer in shared memory alongside `DAG_REBUILD_SIGNAL`. On overflow, fall back to full rebuild. | 1 wk | [plans/performance/PLAN_NEW_STUFF.md §C-2](plans/performance/PLAN_NEW_STUFF.md) |
| C-2-2 | **Incremental topological sort.** Add/remove only affected edges and vertices; re-run topological sort on the affected subgraph only. Cache the sorted schedule in shared memory. | 1–2 wk | [plans/performance/PLAN_NEW_STUFF.md §C-2](plans/performance/PLAN_NEW_STUFF.md) |

> **C-2 subtotal: ~2–3 weeks**

### Unsafe Block Reduction — Phase 6 (UNSAFE-R1/R2)

> **In plain terms:** pg_trickle achieved a 51% reduction in `unsafe` blocks
> (from ~1,300 to 641) in earlier releases. The remaining blocks are
> concentrated in well-documented field-accessor macros and standalone
> `is_a` type checks. Converting these to safe wrappers removes another
> 150–250 unsafe blocks with minimal risk — a meaningful safety improvement
> before 1.0.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| UNSAFE-R1 | **Safe field-accessor macros.** Replace `unsafe { (*node).field }` patterns with safe accessor functions. Estimated reduction: ~100–150 unsafe blocks. | 2–4h | [PLAN_REDUCED_UNSAFE.md §R1](plans/safety/PLAN_REDUCED_UNSAFE.md) |
| UNSAFE-R2 | **Safe `is_a` checks.** Convert standalone `unsafe { is_a(node, T_Foo) }` calls to safe wrapper functions. Estimated reduction: ~50–99 unsafe blocks. | 2–4h | [PLAN_REDUCED_UNSAFE.md §R2](plans/safety/PLAN_REDUCED_UNSAFE.md) |

> **UNSAFE-R1/R2 subtotal: ~4–8 hours**

### `api.rs` Modularization (API-MOD)

> **In plain terms:** `api.rs` is 9,413 lines — the largest file in the
> codebase. It contains stream table CRUD, ALTER QUERY, CDC management,
> bulk operations, diagnostics, and monitoring functions all in one file.
> The same treatment that `parser.rs` received in v0.15.0 (split from 21K
> lines into 5 sub-modules) is needed here. Zero behavior change — purely
> structural.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| API-MOD | **Split `src/api.rs` into sub-modules.** Proposed split: `api/create.rs` (create/drop/alter), `api/refresh.rs` (refresh entry points), `api/cdc.rs` (CDC management), `api/diagnostics.rs` (explain_st, health_check), `api/bulk.rs` (bulk_create), `api/mod.rs` (re-exports). Zero behavior change. | 1–2 wk | — |

> **API-MOD subtotal: ~1–2 weeks**

### pg_ivm Migration Guide (MIG-IVM)

> **In plain terms:** pg_ivm is the incumbent IVM extension with 1,400+
> GitHub stars and 4 years of production use. Many potential pg_trickle
> adopters are currently using pg_ivm. A step-by-step migration guide —
> mapping pg_ivm concepts to pg_trickle equivalents, with concrete SQL
> examples — removes the biggest adoption friction for this audience.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| MIG-IVM | **pg_ivm → pg_trickle migration guide.** Map: `create_immv()` → `create_stream_table()`; `refresh_immv()` → `refresh_stream_table()`; IMMEDIATE mode equivalence; aggregate coverage differences (5 vs 60+); GUC mapping; worked example migrating a real pg_ivm deployment. Publish as `docs/tutorials/MIGRATING_FROM_PG_IVM.md`. | 2–3d | [docs/research/PG_IVM_COMPARISON.md](docs/research/PG_IVM_COMPARISON.md) |

> **MIG-IVM subtotal: ~2–3 days**

### Failure Mode Runbook (RUNBOOK)

> **In plain terms:** Production teams need to know what happens when things
> go wrong — and what to do about it. This documents every failure mode
> pg_trickle can encounter (scheduler crash, WAL slot lag, OOM during
> refresh, disk full, replication slot conflict, stuck watermarks, circular
> convergence failure) with symptoms, diagnosis steps, and resolution
> procedures. Essential for on-call engineers.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| RUNBOOK | **Failure mode runbook.** Document: scheduler crash recovery, WAL decoder failures, OOM during refresh, disk-full behavior, replication slot conflicts, stuck watermarks, circular convergence timeout, CDC trigger failures, SUSPENDED state recovery, lock contention diagnosis. Include `health_check()` output interpretation and `explain_st()` troubleshooting. Publish as `docs/TROUBLESHOOTING.md`. | 3–5d | [docs/PRE_DEPLOYMENT.md](docs/PRE_DEPLOYMENT.md) |

> **RUNBOOK subtotal: ~3–5 days**

### Docker Quickstart Playground (PLAYGROUND)

> **In plain terms:** The fastest way to evaluate any database extension is
> to run it locally in 60 seconds. A `docker-compose.yml` with PostgreSQL +
> pg_trickle pre-installed, sample data (e.g. the org-chart from
> GETTING_STARTED.md), and a Jupyter notebook or pgAdmin web UI gives
> potential users a zero-friction tryout experience. This is the single
> most impactful thing for driving initial adoption.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| PLAYGROUND | **Docker Compose quickstart.** `docker-compose.yml` with: PG 18 + pg_trickle, seed SQL script (org-chart example from GETTING_STARTED.md + TPC-H SF=0.01), pgAdmin web UI (optional). Single `docker compose up` command. README with guided walkthrough. | 2–3d | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) |

> **PLAYGROUND subtotal: ~2–3 days**

### Documentation Polish (DOC-POLISH)

> **In plain terms:** The existing documentation is comprehensive and
> technically excellent, but it's optimized for users already familiar with
> IVM and PostgreSQL internals. These items restructure the docs for a
> better "first hour" experience — simpler getting-started examples, a
> refresh mode decision guide, a condensed new-user FAQ, and a setup
> verification checklist. The goal is to reduce cognitive overload for new
> users without losing the depth that experienced users need.

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| DOC-HELLO | **Simplified "Hello Stream Table" in GETTING_STARTED.** Add a Chapter 0 with a single-table, single-aggregate stream table (e.g. `SELECT department, count(*) FROM employees GROUP BY department`). Create it, insert a row, verify the refresh. Build confidence before the multi-table org-chart example. | 2–4h | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) |
| DOC-DECIDE | **Refresh mode decision guide.** Flowchart: "Need transactional consistency? → IMMEDIATE. Volatile functions? → FULL. Otherwise → AUTO (DIFFERENTIAL with FULL fallback)." Include when-to-use guidance for each mode with concrete examples. Publish as a section in GETTING_STARTED or as a standalone tutorial. | 2–4h | [docs/tutorials/tuning-refresh-mode.md](docs/tutorials/tuning-refresh-mode.md) |
| DOC-FAQ-NEW | **New User FAQ (top 15 questions).** Extract the 15 most common new-user questions from the 3,000-line FAQ into a prominent "New User FAQ" section at the top. Keyword-rich headings for searchability. Link to deep FAQ for details. | 2–3h | [docs/FAQ.md](docs/FAQ.md) |
| DOC-VERIFY | **Post-install verification checklist.** SQL script that verifies: extension loaded, shared_preload_libraries configured, GUCs set, CDC triggers installable, first stream table creates and refreshes successfully. Runnable as `psql -f verify_install.sql`. | 2–4h | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) |
| DOC-STUBS | **Fill or remove research stubs.** `PG_IVM_COMPARISON.md` (60 bytes) and `CUSTOM_SQL_SYNTAX.md` (57 bytes) are empty stubs. Either flesh them out (PG_IVM_COMPARISON can draw from the existing comparison data) or remove from SUMMARY.md. | 2–4h | [docs/research/](docs/research/) |

> **DOC-POLISH subtotal: ~2–3 days**

> **v0.17.0 total: ~2–3 weeks (cost-based strategy) + ~3–4 weeks (columnar tracking) + ~32–48 hours (TIVM Phase 4) + ~1–2 days (ROWS FROM) + ~2–3 weeks (SQLancer) + ~2–3 weeks (incremental DAG) + ~4–8 hours (unsafe reduction) + ~1–2 weeks (api.rs modularization) + ~2–3 days (pg_ivm migration) + ~3–5 days (failure runbook) + ~2–3 days (Docker playground) + ~2–3 days (doc polish)**

**Exit criteria:**
- [x] B-4: Cost-based strategy selector trained on per-ST history; cold-start fallback to fixed threshold; `QueryComplexityClass` cost model (scan/filter/aggregate/join/join_agg); `refresh_strategy` + `cost_model_safety_margin` GUCs; pre-refresh predictive comparison; 10 unit tests
- [x] A-2-COL: CDC trigger emits `changed_cols` VARBIT bitmask (COL-1); delta-scan filters irrelevant rows via `changed_cols & mask` (COL-2); aggregate value-only correction 'V' path halves row volume (COL-3)
- ~~[ ] A2-ENR~~: 🚫 Deferred post-1.0 — requires raw `pg_sys` ENR tuplestore FFI (memory-corruption risk); revisit after 1.0 stabilisation
- ~~[ ] A2-CTR~~: 🚫 Deferred post-1.0 — requires raw `CreateTrigger()` C FFI (memory-corruption risk); revisit after 1.0 stabilisation
- [x] A2-PS: ✅ Already shipped — `pg_trickle.use_prepared_statements` GUC (default `true`) wired in `refresh.rs`; parse/plan overhead eliminated on steady-state workloads
- [x] A8: `ROWS FROM()` with multiple SRFs accepted in defining queries; E2E tests cover INSERT/UPDATE/DELETE propagation
- [x] SQLANCER: ✅ SQLANCER-1/2 crash + equivalence oracles shipped in v0.12.0; SQLANCER-3 diff-vs-full oracle and SQLANCER-4 stateful DML soak (10K mutations) added in v0.17.0; `weekly-sqlancer-stateful` CI job wired
- [x] C-2: Incremental DAG rebuild reduces DDL-triggered latency spike to < 5ms at 100+ STs; ring buffer overflow falls back to full rebuild; no correctness regressions
- [x] UNSAFE-R1/R2: Unsafe block count reduced by 249 (690→441 in parser); `is_node_type!` and `pg_deref!` macros; all 1,700 unit tests pass
- [x] API-MOD: `api.rs` split into 3 sub-modules (mod.rs 5,624 + diagnostics.rs 1,377 + helpers.rs 2,461); zero behavior change; all 1,700 unit tests pass
- [x] MIG-IVM: `docs/tutorials/MIGRATING_FROM_PG_IVM.md` published with step-by-step migration, API mapping, behavioral differences, SQL upgrade examples, and verification checklist
- [x] RUNBOOK: `docs/TROUBLESHOOTING.md` covers 13 failure scenarios (scheduler, SUSPENDED, CDC triggers, WAL slots, INITIALIZING, buffer growth, lock contention, OOM, disk full, circular convergence, schema changes, worker pool, fuse) with symptoms, diagnosis, and resolution
- [x] PLAYGROUND: `playground/` with docker-compose.yml, seed.sql (3 base tables, 5 stream tables), and README walkthrough
- [x] DOC-HELLO: Chapter 1 "Hello World" in GETTING_STARTED already provides the single-table aggregate example (products/category_summary)
- [x] DOC-DECIDE: Refresh mode decision guide already published as `tutorials/tuning-refresh-mode.md` with `recommend_refresh_mode()` and signal breakdown
- [x] DOC-FAQ-NEW: New User FAQ section with 15 keyword-rich entries added at top of FAQ.md
- [x] DOC-VERIFY: `scripts/verify_install.sql` checks shared_preload_libraries, extension, scheduler, GUCs, and runs end-to-end stream table cycle
- [x] DOC-STUBS: Research stubs already use `{{#include}}` directives pointing to substantial content (923 + 1232 lines)
- [x] Extension upgrade path tested (`0.16.0 → 0.17.0`)


---