> **Plain-language companion:** [v0.11.0.md](v0.11.0.md) ## v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness **Status: Released 2026-03-26.** See [CHANGELOG.md §0.11.0](CHANGELOG.md#0110--2026-03-26) for the full feature list. **Highlights:** 34× lower latency via event-driven scheduler wake · incremental ST-to-ST refresh chains · declaratively partitioned stream tables (100× I/O reduction) · ready-to-use Prometheus + Grafana monitoring stack · FUSE circuit breaker · VARBIT changed-column bitmask (no more 63-column cap) · per-database worker quotas · DAG scheduling performance improvements (fused chains, adaptive polling, amplification detection) · TPC-H correctness gate in CI · safer production defaults. ### Partitioned Stream Tables — Storage (A-1) > **In plain terms:** A 10M-row stream table partitioned into 100 ranges means only > the 2–3 partitions that actually received changes are touched by MERGE — reducing > the MERGE scan from 10M rows to ~100K. The partition key must be a user-visible > column and the refresh path must inject a verified range predicate. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | A1-1 | DDL: `CREATE STREAM TABLE … PARTITION BY` declaration; catalog column for partition key | 1–2 wk | [PLAN_NEW_STUFF.md §A-1](plans/performance/PLAN_NEW_STUFF.md) | | A1-2 | Delta inspection: extract min/max of partition key from delta CTE per scheduler tick | 1 wk | [PLAN_NEW_STUFF.md §A-1](plans/performance/PLAN_NEW_STUFF.md) | | A1-3 | MERGE rewrite: inject validated partition-key range predicate or issue per-partition MERGEs via Rust loop | 2–3 wk | [PLAN_NEW_STUFF.md §A-1](plans/performance/PLAN_NEW_STUFF.md) | | A1-4 | E2E benchmarks: 10M-row partitioned ST, 0.1% change rate concentrated in 2–3 partitions | 1 wk | [PLAN_NEW_STUFF.md §A-1](plans/performance/PLAN_NEW_STUFF.md) | > ⚠️ MERGE joins on `__pgt_row_id` (a content hash unrelated to the partition key) — > partition pruning will **not** activate automatically. A predicate injection step > is mandatory. See PLAN_NEW_STUFF.md §A-1 risk analysis before starting. > **Retraction consideration (A-1):** The 5–7 week effort estimate is optimistic. The > core assumption — that partition pruning can be activated via a `WHERE partition_key > BETWEEN ? AND ?` predicate — requires the partition key to be a tracked catalog column > (not currently the case) and a verified range derivation from the delta. The alternative > (per-partition MERGE loop in Rust) is architecturally sound but requires significant > catalog and refresh-path changes. A **design spike** (2–4 days) producing a written > implementation plan must be completed before A1-1 is started. The milestone is at P3 / > Very High risk and should not block the 1.0 release if the design spike reveals > additional complexity. > **Partitioned stream tables subtotal: ~5–7 weeks** ### Multi-Database Scheduler Isolation (C-3) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~C3-1~~ | ~~Per-database worker quotas (`pg_trickle.per_database_worker_quota`); priority ordering (IMMEDIATE > Hot > Warm > Cold); burst capacity up to 150% when other DBs are under budget~~ ✅ Done in v0.11.0 Phase 11 — `compute_per_db_quota()` helper with burst threshold at 80% cluster utilisation; `sort_ready_queue_by_priority()` dispatches ImmediateClosure first; 7 unit tests. | — | [src/scheduler.rs](src/scheduler.rs) | > **Multi-DB isolation subtotal: ✅ Complete** ### Prometheus & Grafana Observability > **In plain terms:** Most teams already run Prometheus and Grafana to monitor > their databases. This ships ready-to-use configuration files — no custom > code, no extension changes — that plug into the standard `postgres_exporter` > and light up a Grafana dashboard showing refresh latency, staleness, error > rates, CDC lag, and per-stream-table detail. Also includes Prometheus > alerting rules so you get paged when a stream table goes stale or starts > error-looping. A Docker Compose file lets you try the full observability > stack with a single `docker compose up`. Zero-code monitoring integration. All config files live in a new `monitoring/` directory in the main repo (or a separate `pgtrickle-monitoring` repo). Queries use existing views (`pg_stat_stream_tables`, `check_cdc_health()`, `quick_health`). | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~OBS-1~~ | ~~**Prometheus metrics out of the box.**~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/prometheus/pg_trickle_queries.yml` exports 14 metrics (per-table refresh stats, health summary, CDC buffer sizes, status counts, recent error rate) via postgres_exporter. | — | [monitoring/prometheus/pg_trickle_queries.yml](monitoring/prometheus/pg_trickle_queries.yml) | | ~~OBS-2~~ | ~~**Get paged when things go wrong.**~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/prometheus/alerts.yml` has 8 alerting rules: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration, cluster WARNING/CRITICAL. | — | [monitoring/prometheus/alerts.yml](monitoring/prometheus/alerts.yml) | | ~~OBS-3~~ | ~~**See everything at a glance.**~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/grafana/dashboards/pg_trickle_overview.json` has 6 sections: cluster overview stat panels, refresh performance time-series, staleness heatmap, CDC health graphs, per-table drill-down table with schema/table variable filters. | — | [monitoring/grafana/dashboards/pg_trickle_overview.json](monitoring/grafana/dashboards/pg_trickle_overview.json) | | ~~OBS-4~~ | ~~**Try it all in one command.**~~ ✅ Done in v0.11.0 Phase 3 — `monitoring/docker-compose.yml` spins up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with pre-wired config and demo seed data (`monitoring/init/01_demo.sql`). `docker compose up` → Grafana at :3000. | — | [monitoring/docker-compose.yml](monitoring/docker-compose.yml) | > **Observability subtotal: ~12 hours** ✅ ### Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md) These four changes flip conservative defaults to the behavior that is safe and correct in production. All underlying features are implemented and tested; only the default values change. Each keeps the original GUC so operators can revert if needed. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~DEF-1~~ | ~~**Flip `parallel_refresh_mode` default to `'on'`.**~~ ✅ Done in v0.11.0 Phase 1 — default flipped; `normalize_parallel_refresh_mode` maps `None`/unknown → `On`; unit test renamed to `defaults_to_on`. | — | [REPORT_OVERALL_STATUS.md §R1](plans/performance/REPORT_OVERALL_STATUS.md) | | DEF-2 | ~~**Flip `auto_backoff` default to `true`.**~~ ✅ Done in v0.10.0 — default flipped to `true`; trigger threshold raised to 95%, cap reduced to 8×, log level raised to WARNING. CONFIGURATION.md updated. | 1–2h | [REPORT_OVERALL_STATUS.md §R10](plans/performance/REPORT_OVERALL_STATUS.md) | | ~~DEF-3~~ | ~~**SemiJoin delta-key pre-filter (O-1).**~~ ✅ Verified already implemented in v0.11.0 Phase 2 — `left_snapshot_filtered` pre-filter with `WHERE left_key IN (SELECT DISTINCT right_key FROM delta)` was already present in `semi_join.rs`. | — | [src/dvm/operators/semi_join.rs](src/dvm/operators/semi_join.rs) | | ~~DEF-4~~ | ~~**Increase invalidation ring capacity from 32 to 128 slots.**~~ ✅ Done in v0.11.0 Phase 1 — `INVALIDATION_RING_CAPACITY` raised to 128 in `shmem.rs`. | — | [REPORT_OVERALL_STATUS.md §R9](plans/performance/REPORT_OVERALL_STATUS.md) | | ~~DEF-5~~ | ~~**Flip `block_source_ddl` default to `true`.**~~ ✅ Done in v0.11.0 Phase 1 — default flipped to `true`; both error messages in `hooks.rs` include step-by-step escape-hatch procedure. | — | [REPORT_OVERALL_STATUS.md §R12](plans/performance/REPORT_OVERALL_STATUS.md) | > **Default tuning subtotal: ~14–21 hours** ### Safety & Resilience Hardening (Must-Ship) > **In plain terms:** The background worker should never silently hang or leave a > stream table in an undefined state when an internal operation fails. These items > replace `panic!`/`unwrap()` in code paths reachable from the background worker > with structured errors and graceful recovery. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~SAF-1~~ | ~~**Replace worker-path panics with structured errors.**~~ ✅ Done in v0.11.0 Phase 1 — full audit of `scheduler.rs`, `refresh.rs`, `hooks.rs`: no `panic!`/`unwrap()` outside `#[cfg(test)]`. `check_skip_needed` now logs `WARNING` on SPI error with table name and error details. Audit finding documented in comment. | — | [src/scheduler.rs](src/scheduler.rs) | | ~~SAF-2~~ | ~~**Failure-injection E2E test.**~~ ✅ Done in v0.11.0 Phase 2 — two E2E tests in `tests/e2e_safety_tests.rs`: (1) column drop triggers UpstreamSchemaChanged, verifies scheduler stays alive and other STs continue; (2) source table drop, same verification. | — | [tests/e2e_safety_tests.rs](tests/e2e_safety_tests.rs) | > **Safety hardening subtotal: ~7–12 hours** ### Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15) > **In plain terms:** Six self-contained improvements identified in the deep gap > analysis. Each takes under a day and substantially reduces silent failure > modes, operator confusion, and diagnostic friction. #### Quick Fixes (< 1 hour each) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | QF-1 | ~~**Fix unguarded debug `println!`.**~~ ✅ Done in v0.11.0 Phase 1 — `println!` replaced with `pgrx::log!()` guarded by new `pg_trickle.log_merge_sql` GUC (default `off`). | — | [src/refresh.rs](src/refresh.rs) | | QF-2 | ~~**Upgrade AUTO mode downgrade log level.**~~ ✅ Done in v0.11.0 Phase 1 — four AUTO→FULL downgrade paths in `api.rs` raised from `pgrx::info!()` to `pgrx::warning!()`. | — | [plans/performance/REPORT_OVERALL_STATUS.md §12](plans/performance/REPORT_OVERALL_STATUS.md) | | QF-3 | ~~**Warn when `append_only` auto-reverts.**~~ ✅ Verified already implemented — `pgrx::warning!()` + `emit_alert(AppendOnlyReverted)` already present in `refresh.rs`. | — | [plans/performance/REPORT_OVERALL_STATUS.md §15](plans/performance/REPORT_OVERALL_STATUS.md) | | QF-4 | ~~**Document parser `unwrap()` invariants.**~~ ✅ Done in v0.11.0 Phase 1 — `// INVARIANT:` comments added at four `unwrap()` sites in `dvm/parser.rs` (after `is_empty()` guard, `len()==1` guards, and non-empty `Err` return). | — | [src/dvm/parser.rs](src/dvm/parser.rs) | > **Quick-fix subtotal: ~3–4 hours** #### Effective Refresh Mode Tracking (G12-ERM) > **In plain terms:** When a stream table is configured as `AUTO`, operators > currently have no way to discover which mode is *actually* being used at > runtime without reading warning logs. Storing the resolved mode in the catalog > and exposing a diagnostic function closes this observability gap. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~G12-ERM-1~~ | ~~Add `effective_refresh_mode` column to `pgt_stream_tables`~~. ✅ Done in v0.11.0 Phase 2 — column added; scheduler writes actual mode (FULL/DIFFERENTIAL/APPEND_ONLY/TOP_K/NO_DATA) via thread-local tracking; upgrade SQL `pg_trickle--0.10.0--0.11.0.sql` created. | — | [src/catalog.rs](src/catalog.rs) | | ~~G12-ERM-2~~ | ~~Add `explain_refresh_mode(name TEXT)` SQL function~~. ✅ Done in v0.11.0 Phase 2 — `pgtrickle.explain_refresh_mode()` returns configured mode, effective mode, and downgrade reason. | — | [src/api.rs](src/api.rs) | > **Effective refresh mode subtotal: ~4–7 hours** #### Correctness Guards (G12-2, G12-AGG) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~G12-2~~ | ~~**TopK runtime validation.**~~ ✅ Done in v0.11.0 Phase 4 — `validate_topk_metadata()` re-parses the reconstructed full query on each TopK refresh; `validate_topk_metadata_fields()` validates stored fields (pure logic, unit-testable). Falls back to FULL + `WARNING` on mismatch. 7 unit tests. | — | [src/refresh.rs](src/refresh.rs) | | ~~G12-AGG~~ | ~~**Group-rescan aggregate warning.**~~ ✅ Done in v0.11.0 Phase 4 — `classify_agg_strategy()` classifies each aggregate as ALGEBRAIC_INVERTIBLE / ALGEBRAIC_VIA_AUX / SEMI_ALGEBRAIC / GROUP_RESCAN. Warning emitted at `create_stream_table` time for DIFFERENTIAL + group-rescan aggs. Strategy exposed in `explain_st()` as `aggregate_strategies` JSON. 18 unit tests. | — | [src/dvm/parser.rs](src/dvm/parser.rs) | > **Correctness guards subtotal: ✅ Complete** #### Parameter & Error Hardening (G15-PV, G13-EH) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~G15-PV~~ | ~~**Validate incompatible parameter combinations.**~~ ✅ Done in v0.11.0 Phase 2 — (a) `cdc_mode='wal'` + `refresh_mode='IMMEDIATE'` rejection was already present; (b) `diamond_schedule_policy='slowest'` + `diamond_consistency='none'` now rejected in `create_stream_table_impl` and `alter_stream_table_impl` with structured error. | — | [src/api.rs](src/api.rs) | | ~~G13-EH~~ | ~~**Structured error HINT/DETAIL fields.**~~ ✅ Done in v0.11.0 Phase 2 — `raise_error_with_context()` helper in `api.rs` uses `ErrorReport::new().set_detail().set_hint()` for `UnsupportedOperator`, `CycleDetected`, `UpstreamSchemaChanged`, and `QueryParseError`; all 8 API-boundary error sites updated. | — | [src/api.rs](src/api.rs) | > **Parameter & error hardening subtotal: ~6–12 hours** #### Testing: EC-01 Boundary Regression (G17-EC01B-NEG) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~G17-EC01B-NEG~~ | ~~Add a negative regression test asserting that ≥3-scan join right subtrees currently fall back to FULL refresh.~~ ✅ Done in v0.11.0 Phase 4 — 4 unit tests in `join_common.rs` covering 3-way join, 4-way join, right-subtree ≥3 scans, and 2-scan boundary. `// TODO: Remove when EC01B-1/EC01B-2 fixed in v0.12.0` | — | [src/dvm/operators/join_common.rs](src/dvm/operators/join_common.rs) | > **EC-01 boundary regression subtotal: ✅ Complete** #### Documentation Quick Wins (G16-GS, G16-SM, G16-MQR, G15-GUC) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | G16-GS | **Restructure `GETTING_STARTED.md` with progressive complexity.** Five chapters: (1) Hello World — single-table ST with no join; (2) Multi-table join; (3) Scheduling & backpressure; (4) Monitoring — 5 key functions; (5) Advanced — FUSE, wide bitmask, partitions. Remove the current flat wall-of-SQL structure. ✅ Done in v0.11.0 Phase 11 — 5-chapter structure implemented; Chapter 1 Hello World example added; Chapter 5 Advanced Topics adds inline FUSE, partitioning, IMMEDIATE, and multi-tenant quota examples. | — | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) | | ~~G16-SM~~ | ~~**SQL/mode operator support matrix.**~~ ✅ Done — 60+ row operator support matrix added to `docs/DVM_OPERATORS.md` covering all operators × FULL/DIFFERENTIAL/IMMEDIATE modes with caveat footnotes. | — | [docs/DVM_OPERATORS.md](docs/DVM_OPERATORS.md) | | ~~G16-MQR~~ | ~~**Monitoring quick reference.**~~ ✅ Done — Monitoring Quick Reference section added to `docs/GETTING_STARTED.md` with `pgt_status()`, `health_check()`, `change_buffer_sizes()`, `dependency_tree()`, `fuse_status()`, Prometheus/Grafana stack, key metrics table, and alert summary. | — | [docs/GETTING_STARTED.md](docs/GETTING_STARTED.md) | | ~~G15-GUC~~ | ~~**GUC interaction matrix.**~~ ✅ Done — GUC Interaction Matrix (14 interaction pairs) and three named Tuning Profiles (Low-Latency, High-Throughput, Resource-Constrained) added to `docs/CONFIGURATION.md`. | — | [docs/CONFIGURATION.md](docs/CONFIGURATION.md) | > **Documentation subtotal: ~2–3 days** > **Correctness quick-wins & documentation subtotal: ~1–2 days code + ~2–3 days docs** ### Should-Ship Additions #### Wider Changed-Column Bitmask (>63 columns) > **In plain terms:** Stream tables built on source tables with more than 63 columns > fall back silently to tracking every column on every UPDATE, losing all CDC selectivity. > Extending the `changed_cols` field from a `BIGINT` to a `BYTEA` vector removes this > cliff without breaking existing deployments. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | WB-1 | Extend the CDC trigger `changed_cols` column from `BIGINT` to `BYTEA`; update bitmask encoding/decoding in `cdc.rs`; add schema migration for existing change buffer tables (tables with <64 columns are unaffected at the data level). | 1–2 wk | [REPORT_OVERALL_STATUS.md §R13](plans/performance/REPORT_OVERALL_STATUS.md) | | WB-2 | E2E test: wide (>63 column) source table; verify only referenced columns trigger delta propagation; benchmark UPDATE selectivity before/after. | 2–4h | `tests/e2e_cdc_tests.rs` | > **Wider bitmask subtotal: ~1–2 weeks + ~4h testing** #### Fuse — Anomalous Change Detection > **In plain terms:** A circuit breaker that stops a stream table from processing > an unexpectedly large batch of changes (runaway script, mass delete, data migration) > without operator review. A blown fuse halts refresh and emits a `pgtrickle_alert` > NOTIFY; `reset_fuse()` resumes with a chosen recovery action (`apply`, > `reinitialize`, or `skip_changes`). | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~FUSE-1~~ ✅ | ~~Catalog: fuse state columns on `pgt_stream_tables` (`fuse_mode`, `fuse_state`, `fuse_ceiling`, `fuse_sensitivity`, `blown_at`, `blow_reason`)~~ | 1–2h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | | ~~FUSE-2~~ ✅ | ~~`alter_stream_table()` new params: `fuse`, `fuse_ceiling`, `fuse_sensitivity`~~ | 1h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | | ~~FUSE-3~~ ✅ | ~~`reset_fuse(name, action => 'apply'\|'reinitialize'\|'skip_changes')` SQL function~~ | 1h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | | ~~FUSE-4~~ ✅ | ~~`fuse_status()` introspection function~~ | 1h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | | ~~FUSE-5~~ ✅ | ~~Scheduler pre-check: count change buffer rows; evaluate threshold; blow fuse + NOTIFY if exceeded~~ | 2–3h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | | ~~FUSE-6~~ ✅ | ~~E2E tests: normal baseline, spike → blow, reset (`apply`/`reinitialize`/`skip_changes`), diamond/DAG interaction~~ | 4–6h | [PLAN_FUSE.md](plans/sql/PLAN_FUSE.md) | > **Fuse subtotal: ~10–14 hours — ✅ Complete** #### External Correctness Gate (TS1 or TS2) > **In plain terms:** Run an independent public query corpus through pg_trickle's > DIFFERENTIAL mode and assert the results match a vanilla PostgreSQL execution. > This catches blind spots that the extension's own test suite cannot, and > provides an objective correctness baseline before v1.0. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | TS1 | **sqllogictest suite.** Run the PostgreSQL sqllogic suite through pg_trickle DIFFERENTIAL mode; gate CI on zero correctness mismatches. *Preferred choice: broadest query coverage.* | 2–3d | [PLAN_TESTING_GAPS.md §J](plans/testing/PLAN_TESTING_GAPS.md) | | TS2 | **JOB (Join Order Benchmark).** Correctness baseline and refresh latency profiling on realistic multi-join analytical queries. *Alternative if sqllogictest setup is too costly.* | 1–2d | [PLAN_TESTING_GAPS.md §J](plans/testing/PLAN_TESTING_GAPS.md) | Deliver **one** of TS1 or TS2; whichever is completed first meets the exit criterion. > **External correctness gate subtotal: ~1–3 days** #### Differential ST-to-ST Refresh (✅ Done) > **In plain terms:** When stream table B's defining query reads from stream > table A, pg_trickle currently forces a FULL refresh of B every time A > updates — re-executing B's entire query even when only a handful of rows > changed. This feature gives ST-to-ST dependencies the same CDC change > buffer that base tables already have, so B refreshes differentially (applying > only the delta). Crucially, even when A itself does a FULL refresh, a > pre/post snapshot diff is captured so B still receives a small I/D delta > rather than cascading FULL through the chain. | Item | Description | Status | Ref | |------|-------------|--------|-----| | ST-ST-1 | **Change buffer infrastructure.** `create_st_change_buffer_table()` / `drop_st_change_buffer_table()` in `cdc.rs`; lifecycle hooks in `api.rs`; idempotent `ensure_st_change_buffer()` | ✅ Done | [PLAN_ST_TO_ST.md §Phase 1](plans/sql/PLAN_ST_TO_ST.md) | | ST-ST-2 | **Delta capture — DIFFERENTIAL path.** Force explicit DML when ST has downstream consumers; capture delta from `__pgt_delta_{id}` to `changes_pgt_{id}` | ✅ Done | [PLAN_ST_TO_ST.md §Phase 2](plans/sql/PLAN_ST_TO_ST.md) | | ST-ST-3 | **Delta capture — FULL path.** Pre/post snapshot diff writes I/D pairs to `changes_pgt_{id}`; eliminates cascading FULL | ✅ Done | [PLAN_ST_TO_ST.md §7](plans/sql/PLAN_ST_TO_ST.md) | | ST-ST-4 | **DVM scan operator for ST sources.** Read from `changes_pgt_{id}`; `pgt_`-prefixed LSN tokens; extended frontier and placeholder resolver | ✅ Done | [PLAN_ST_TO_ST.md §Phase 3](plans/sql/PLAN_ST_TO_ST.md) | | ST-ST-5 | **Scheduler integration.** Buffer-based change detection in `has_stream_table_source_changes()`; removed FULL override; frontier augmented with ST source positions | ✅ Done | [PLAN_ST_TO_ST.md §Phase 4](plans/sql/PLAN_ST_TO_ST.md) | | ST-ST-6 | **Cleanup & lifecycle.** `cleanup_st_change_buffers_by_frontier()` for ST buffers; removed prewarm skip for ST sources; ST buffer cleanup in both differential and full refresh paths | ✅ Done | [PLAN_ST_TO_ST.md §Phase 5–6](plans/sql/PLAN_ST_TO_ST.md) | > **ST-to-ST differential subtotal: ~4.5–6.5 weeks** ### Adaptive/Event-Driven Scheduler Wake (Must-Ship) > **In plain terms:** The scheduler currently wakes on a fixed 1-second timer > even when nothing has changed. This adds event-driven wake: CDC triggers > notify the scheduler immediately when changes arrive. Median end-to-end > latency drops from ~515 ms to ~15 ms for low-volume workloads — a 34× > improvement. This is a must-ship item because **low latency is a primary > project goal**. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~WAKE-1~~ | ~~**Event-driven scheduler wake.**~~ ✅ Done in v0.11.0 Phase 7 — CDC triggers emit `pg_notify('pgtrickle_wake', '')` after each change buffer INSERT; scheduler issues `LISTEN pgtrickle_wake` at startup; 10 ms debounce coalesces rapid notifications; poll fallback preserved. New GUCs: `event_driven_wake` (default `true`), `wake_debounce_ms` (default `10`). E2E tests in `tests/e2e_wake_tests.rs`. | — | [REPORT_OVERALL_STATUS.md §R16](plans/performance/REPORT_OVERALL_STATUS.md) | > **Event-driven wake subtotal: ✅ Complete** ### Stretch Goals (if capacity allows after Must-Ship) | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~STRETCH-1~~ | ~~**Partitioned stream tables — design spike only.**~~ ✅ Done in v0.11.0 Partitioning Spike — RFC written ([PLAN_PARTITIONING_SPIKE.md](plans/PLAN_PARTITIONING_SPIKE.md)), go/no-go decision: **Go**. A1-1 implemented (catalog column, API parameter, validation). | 2–4d | [PLAN_PARTITIONING_SPIKE.md](plans/PLAN_PARTITIONING_SPIKE.md) | | ~~A1-1~~ | ~~**DDL: `CREATE STREAM TABLE … PARTITION BY`; `st_partition_key` catalog column.**~~ ✅ Done — `partition_by` parameter added to all three `create_stream_table*` functions; `st_partition_key TEXT` column in catalog; `validate_partition_key()` validates column exists in output; `build_create_table_sql` emits `PARTITION BY RANGE (key)`; `setup_storage_table` creates default catch-all partition and non-unique `__pgt_row_id` index. | 1–2 wk | [PLAN_PARTITIONING_SPIKE.md](plans/PLAN_PARTITIONING_SPIKE.md) | | ~~A1-2~~ | ~~**Delta min/max inspection.**~~ ✅ Done — `extract_partition_range()` in `refresh.rs` runs `SELECT MIN/MAX(key)::text` on the resolved delta SQL; returns `None` on empty delta (MERGE skipped). | 1 wk | [PLAN_PARTITIONING_SPIKE.md §8](plans/PLAN_PARTITIONING_SPIKE.md) | | ~~A1-3~~ | ~~**MERGE rewrite.**~~ ✅ Done — `inject_partition_predicate()` replaces `__PGT_PART_PRED__` placeholder in MERGE ON clause with `AND st."key" BETWEEN 'min' AND 'max'`; `CachedMergeTemplate` stores `delta_sql_template`; D-2 prepared statements disabled for partitioned STs. | 2–3 wk | [PLAN_PARTITIONING_SPIKE.md §8](plans/PLAN_PARTITIONING_SPIKE.md) | | ~~A1-4~~ | ~~**E2E benchmarks: 10M-row partitioned ST, 0.1%/0.2%/100% change rate scenarios; `EXPLAIN (ANALYZE, BUFFERS)` partition-scan verification.**~~ ✅ Done — 7 E2E tests added to `tests/e2e_partition_tests.rs` covering: initial populate, differential inserts, updates/deletes, empty-delta fast path, EXPLAIN plan verification, invalid partition key rejection; added to light-E2E allowlist. | 1 wk | [PLAN_PARTITIONING_SPIKE.md §9](plans/PLAN_PARTITIONING_SPIKE.md) | > **Stretch subtotal: STRETCH-1 + A1-1 + A1-2 + A1-3 + A1-4 ✅ All complete** ### DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8) > **In plain terms:** Now that ST-to-ST differential refresh eliminates the > "every hop is FULL" bottleneck, the next performance frontier is reducing > per-hop overhead and exploiting DAG structure more aggressively. These items > target the scheduling and dispatch layer — not the DVM engine — and > collectively can reduce end-to-end propagation latency by 30–50% for > heterogeneous DAGs. | Item | Description | Effort | Ref | |------|-------------|--------|-----| | ~~DAG-1~~ | ~~**Intra-tick pipelining.** Within a single scheduler tick, begin processing a downstream ST as soon as all its specific upstream dependencies have completed — not when the entire topological level finishes. Requires per-ST completion tracking in the parallel dispatch loop and immediate enqueuing of newly-ready STs. Expected 30–50% latency reduction for DAGs with mixed-cost levels.~~ ✅ Done — Already achieved by Phase 4’s parallel dispatch architecture: per-dependency `remaining_upstreams` tracking with immediate downstream readiness propagation. No level barrier exists. 3 validation tests. | 2–3 wk | [PLAN_DAG_PERFORMANCE.md §8.1](plans/performance/PLAN_DAG_PERFORMANCE.md) | | ~~DAG-2~~ | ~~**Adaptive poll interval.** Replace the fixed 200 ms parallel dispatch poll with exponential backoff (20 ms → 200 ms), resetting on worker completion. Makes parallel mode competitive with CALCULATED for cheap refreshes ($T_r \approx 10\text{ms}$). Alternative: `WaitLatch` with shared-memory completion flags.~~ ✅ Done — `compute_adaptive_poll_ms()` pure-logic helper with exponential backoff (20ms → 200ms); `ParallelDispatchState` tracks `adaptive_poll_ms` + `completions_this_tick`; resets to 20ms on worker completion; 8 unit tests. | 1–2 wk | [PLAN_DAG_PERFORMANCE.md §8.2](plans/performance/PLAN_DAG_PERFORMANCE.md) | | ~~DAG-3~~ | ~~**Delta amplification detection.** Track input→output delta ratio per hop via `pgt_refresh_history`. When a join ST amplifies delta beyond a configurable threshold (e.g., output > 100× input), emit a performance WARNING and optionally fall back to FULL for that hop. Expose amplification metrics in `explain_st()`.~~ ✅ Done — `pg_trickle.delta_amplification_threshold` GUC (default 100×); `compute_amplification_ratio` + `should_warn_amplification` pure-logic helpers; WARNING emitted after MERGE with ratio, counts, and tuning hint; `explain_st()` exposes `amplification_stats` JSON from last 20 DIFFERENTIAL refreshes; 15 unit tests. | 3–5d | [PLAN_DAG_PERFORMANCE.md §8.4](plans/performance/PLAN_DAG_PERFORMANCE.md) | | ~~DAG-4~~ | ~~**ST buffer bypass for single-consumer CALCULATED chains.** For ST dependencies with exactly one downstream consumer refreshing in the same tick, pass the delta in-memory instead of writing/reading from the `changes_pgt_` buffer table. Eliminates 2× SPI DML per hop (~20 ms savings per hop for 10K-row deltas).~~ ✅ Done — `FusedChain` execution unit kind; `find_fusable_chains()` pure-logic detection; `capture_delta_to_bypass_table()` writes to temp table; `DiffContext.st_bypass_tables` threads bypass through DVM scan; delta SQL cache bypassed when active; 11+4 unit tests. | 3–4 wk | [PLAN_DAG_PERFORMANCE.md §8.3](plans/performance/PLAN_DAG_PERFORMANCE.md) | | ~~DAG-5~~ | ~~**ST buffer batch coalescing.** Apply net-effect computation to ST change buffers before downstream reads — cancel INSERT/DELETE pairs for the same `__pgt_row_id` that accumulate between reads during rapid-fire upstream refreshes. Adapts existing `compute_net_effect()` logic to the ST buffer schema.~~ ✅ Done — `compact_st_change_buffer()` with `build_st_compact_sql()` pure-logic helper; advisory lock namespace 0x5047_5500; integrated in `execute_differential_refresh()` after C-4 base-table compaction; 9 unit tests. | 1–2 wk | [PLAN_DAG_PERFORMANCE.md §8.5](plans/performance/PLAN_DAG_PERFORMANCE.md) | > **DAG refresh performance subtotal: ~8–12 weeks** > **v0.11.0 total: ~7–10 weeks (partitioning + isolation) + ~12h observability + ~14–21h default tuning + ~7–12h safety hardening + ~2–4 weeks should-ship (bitmask + fuse + external corpus) + ~4.5–6.5 weeks ST-to-ST differential + ~2–3 weeks event-driven wake + ~1–2 days correctness quick-wins + ~2–3 days documentation + ~8–12 weeks DAG performance** **Exit criteria: ✅ All met. Released 2026-03-26.** - [x] Declaratively partitioned stream tables accepted; partition key tracked in catalog — ✅ Done in v0.11.0 Partitioning Spike (STRETCH-1 RFC + A1-1) - [x] Partitioned storage table created with `PARTITION BY RANGE` + default catch-all partition — ✅ Done (A1-1 physical DDL) - [x] Partition-key range predicate injected into MERGE ON clause; empty-delta fast-path skips MERGE — ✅ Done (A1-2 + A1-3) - [x] Partition-scoped MERGE benchmark: 10M-row ST, 0.1% change rate (expect ~100× I/O reduction) — ✅ Done (A1-4 E2E tests) - [x] Per-database worker quotas enforced; burst reclaimed within 1 scheduler cycle — ✅ Done in v0.11.0 Phase 11 (`pg_trickle.per_database_worker_quota` GUC; burst to 150% at < 80% cluster load) - [x] Prometheus queries + alerting rules + Grafana dashboard shipped — ✅ Done in v0.11.0 Phase 3 (`monitoring/` directory) - [x] DEF-1: `parallel_refresh_mode` default is `'on'`; unit test updated — ✅ Done in v0.11.0 Phase 1 - [x] DEF-2: `auto_backoff` default is `true`; CONFIGURATION.md updated — ✅ Done in v0.10.0 - [x] DEF-3: SemiJoin delta-key pre-filter verified already implemented — ✅ Done in v0.11.0 Phase 2 (pre-existing in `semi_join.rs`) - [x] DEF-4: Invalidation ring capacity is 128 slots — ✅ Done in v0.11.0 Phase 1 - [x] DEF-5: `block_source_ddl` default is `true`; error message includes escape-hatch instructions — ✅ Done in v0.11.0 Phase 1 - [x] SAF-1: No `panic!`/`unwrap()` in background worker hot paths; `check_skip_needed` logs SPI errors — ✅ Done in v0.11.0 Phase 1 - [x] SAF-2: Failure-injection E2E tests in `tests/e2e_safety_tests.rs` — ✅ Done in v0.11.0 Phase 2 - [x] WB-1+2: Changed-column bitmask supports >63 columns (VARBIT); wide-table CDC selectivity E2E passes; schema migration tested — ✅ Done in v0.11.0 Phase 5 - [x] FUSE-1–6: Fuse blows on configurable change-count threshold; `reset_fuse()` recovers in all three action modes; diamond/DAG interaction tested — ✅ Done in v0.11.0 Phase 6 - [x] TS2: TPC-H-derived 5-query DIFFERENTIAL correctness gate passes with zero mismatches; gated in CI — ✅ Done in v0.11.0 Phase 9 - [x] QF-1–4: `println!` replaced with guarded `pgrx::log!()`; AUTO downgrades emit `WARNING`; `append_only` reversion verified already warns; parser invariant sites annotated — ✅ Done in v0.11.0 Phase 1 - [x] G12-ERM: `effective_refresh_mode` column present in `pgt_stream_tables`; `explain_refresh_mode()` returns configured mode, effective mode, downgrade reason — ✅ Done in v0.11.0 Phase 2 - [x] G12-2: TopK path validates assumptions at refresh time; triggers FULL fallback with `WARNING` on violation — ✅ Done in v0.11.0 Phase 4 - [x] G12-AGG: Group-rescan aggregate warning fires at `create_stream_table` for DIFFERENTIAL mode; strategy visible in `explain_st()` — ✅ Done in v0.11.0 Phase 4 - [x] G15-PV: Incompatible `cdc_mode`/`refresh_mode` and `diamond_schedule_policy` combinations rejected at creation time with structured `HINT` — ✅ Done in v0.11.0 Phase 2 - [x] G13-EH: `UnsupportedOperator`, `CycleDetected`, `UpstreamSchemaChanged`, `QueryParseError` include `DETAIL` and `HINT` fields — ✅ Done in v0.11.0 Phase 2 - [x] G17-EC01B-NEG: Negative regression test documents ≥3-scan fall-back behavior; linked to v0.12.0 EC01B fix — ✅ Done in v0.11.0 Phase 4 - [x] G16-GS/SM/MQR/GUC: GETTING_STARTED restructured (5 chapters + Hello World + Advanced Topics); DVM_OPERATORS support matrix; monitoring quick reference; CONFIGURATION.md GUC matrix — ✅ Done in v0.11.0 Phase 11 - [x] ST-ST-1–6: All ST-to-ST dependencies refresh differentially when upstream has a change buffer; FULL refreshes on upstream produce pre/post I/D diff; no cascading FULL — ✅ Done in v0.11.0 Phase 8 - [x] WAKE-1: Event-driven scheduler wake; median latency ~15 ms (34× improvement); 10 ms debounce; poll fallback — ✅ Done in v0.11.0 Phase 7 - [x] DAG-1: Intra-tick pipelining confirmed in Phase 4 architecture — ✅ Done - [x] DAG-2: Adaptive poll interval (20 ms → 200 ms exponential backoff) — ✅ Done in v0.11.0 Phase 10 - [x] DAG-3: Delta amplification detection with `pg_trickle.delta_amplification_threshold` GUC — ✅ Done in v0.11.0 Phase 10 - [x] DAG-4: ST buffer bypass (`FusedChain`) for single-consumer CALCULATED chains — ✅ Done in v0.11.0 Phase 10 - [x] DAG-5: ST buffer batch coalescing cancels redundant I/D pairs — ✅ Done in v0.11.0 Phase 10 - [x] Extension upgrade path tested (`0.10.0 → 0.11.0`) — ✅ upgrade SQL in `sql/pg_trickle--0.10.0--0.11.0.sql` ---