> **Plain-language companion:** [v0.81.0.md](v0.81.0.md) ## v0.81.0 — Observability, Self-Tuning & Quick Wins **Status: In progress.** Derived from [Assessment 16](../plans/PLAN_OVERALL_ASSESSMENT_16.md) (QW-1 through QW-10). > **Release Theme** > Immediate-value improvements that require minimal architectural change while > delivering significant observability, ergonomic, and performance gains to the > current single-node engine. Every item is backward-compatible and benefits all > deployment modes. --- ## Implementation Status | ID | Title | Status | Effort | |----|-------|--------|--------| | QW-1 | Commit-to-Visible Latency Metric | ✅ Done | Small | | QW-2 | Configuration Advisor Function | ✅ Done | Medium | | QW-3 | Preview / Dry-Run Mode | ✅ Done | Small | | QW-4 | OpenTelemetry Trace Spans | ✅ Done | Medium | | QW-5 | Bounded L0/L1 Template Cache | ✅ Done | Small | | QW-6 | DeltaOperator Trait | ✅ Done | Medium | | QW-7 | Split config.rs by Category | ✅ Done | Small | | QW-8 | Self-Healing Circuit Breaker | ✅ Done | Medium | | QW-9 | Chunked MERGE for Large Deltas | ✅ Done | Medium | | QW-10 | Stream Table Presets | ✅ Done | Small | --- ### Correctness No correctness changes in this release. All differential refresh semantics are unchanged. --- ### Stability #### QW-8: Self-Healing Circuit Breaker Extended the `max_consecutive_errors` suspension mechanism with auto-remediation: - **OOM detected** (SPI error containing "out of memory"): reduce `merge_work_mem_mb` by 25% for the affected ST and retry. - **Lock timeout detected**: increase that ST's effective scheduler interval by 2× (exponential backoff) until 3 consecutive successes. - **Sustained lag** (>5× schedule interval): temporarily add +1 refresh worker if below `max_worker_processes` capacity. All remediations are logged to `pgt_refresh_history` with reason codes. --- ### Performance #### QW-5: Bounded L0/L1 Template Cache Added LRU eviction to the thread-local `DELTA_TEMPLATE_CACHE` and `PLACEHOLDER_RESOLVER_CACHE`. New GUC `pg_trickle.l1_cache_max_entries` (default 256) caps per-session memory usage. In 10K-ST deployments, this prevents unbounded growth in long-lived backend sessions. Note: `pg_trickle.template_cache_max_entries` already controlled the L2 catalog cache; this release adds L0/L1 (thread-local) bounding. #### QW-9: Chunked MERGE for Large Deltas When delta row count exceeds `pg_trickle.merge_batch_size` (default 50,000), split the MERGE into batched statements: 1. Materialize delta into a temp table 2. Execute MERGE in chunks of `merge_batch_size` rows using row-number windows 3. Drop temp table This reduces peak memory usage and lock hold time for large delta sets. --- ### Scalability No scalability changes beyond QW-8 worker auto-scaling. --- ### Ease of Use #### QW-2: Configuration Advisor Function `SELECT * FROM pgtrickle.tune_recommendations()` returns a table of `(guc_name, current_value, recommended_value, reason)` based on observed workload patterns: refresh latency percentiles, memory usage, worker utilization, CDC lag trends. #### QW-3: Preview / Dry-Run Mode `SELECT * FROM pgtrickle.preview_stream_table(query text)` returns: - Detected source tables and their CDC mode - Planned refresh strategy (FULL/DIFFERENTIAL/AUTO) - OpTree complexity class - Estimated delta SQL template size - Any DVM support warnings No side effects — does not create the stream table. #### QW-10: Stream Table Presets Named configuration profiles via `preset` parameter on `create_stream_table`: ```sql SELECT pgtrickle.create_stream_table('my_view', 'SELECT ...', preset => 'real-time'); ``` | Preset | Schedule | Mode | Workers | Memory | |--------|----------|------|---------|--------| | `real-time` | 1s | DIFFERENTIAL | max | 256MB | | `batch` | 5m | AUTO | 1 | 64MB | | `cost-optimized` | 15m | AUTO | 1 | 32MB | --- ### Test Coverage #### QW-1: Commit-to-Visible Latency Metric `pg_trickle_commit_to_visible_ms` Prometheus histogram with per-ST labels. Requires `track_commit_timestamp = on`. Exposed via `pgtrickle.commit_latency_stats()`. #### QW-4: OpenTelemetry Trace Spans New span names added to `src/otel.rs`: - `pgtrickle.scheduler_tick` — outer span per scheduler wake cycle - `pgtrickle.refresh_cycle` — per-ST refresh (includes mode decision) - `pgtrickle.delta_execute` — delta SQL execution time - `pgtrickle.frontier_advance` — frontier update - `pgtrickle.cleanup` — change buffer cleanup --- ### Code Quality #### QW-6: DeltaOperator Trait Defined `DeltaOperator` trait in `src/dvm/operators/mod.rs`: ```rust pub trait DeltaOperator { fn generate_delta( &self, ctx: &mut DiffContext, children: &[DiffResult], ) -> Result; fn supports_immediate_mode(&self) -> bool { false } fn is_monotone(&self) -> bool { false } } ``` All 22 operator implementations migrated to this trait. #### QW-7: Split config.rs by Category Moved GUC declarations from the monolithic `src/config.rs` into focused sub-modules: - `src/config/scheduler.rs` — scheduler interval, workers, backoff - `src/config/cdc.rs` — CDC mode, WAL transition, buffer thresholds - `src/config/dvm.rs` — parse depth, CTE cap, template cache, algebraic drift - `src/config/monitoring.rs` — alert thresholds, history retention, metrics Re-exported from `src/config/mod.rs` for backward-compatible imports. --- ## Exit Criteria - [ ] All 10 QW items implemented - [ ] `just fmt && just lint` passes with zero warnings - [ ] `just test-unit` passes - [ ] `just test-integration` passes - [ ] `just test-light-e2e` passes - [ ] `pgtrickle.tune_recommendations()` returns actionable rows - [ ] `pgtrickle.preview_stream_table(query)` returns results without side effects - [ ] `pgtrickle.create_stream_table(..., preset => 'real-time')` creates a stream table with correct defaults - [ ] Commit-to-visible histogram shows in Prometheus output when `track_commit_timestamp = on` - [ ] Chunked MERGE triggers when delta exceeds `merge_batch_size`