# v0.50.0 — Performance, Security & Operational Hardening > **Theme:** Remove the last known performance inefficiencies in the differential > refresh hot path; close the security defense-in-depth gaps; make the > operational stack world-class with CNPG graceful drain, digest-pinned images, > and complete Prometheus observability. ## Why This Release With test infrastructure solid after v0.49.0, this release addresses three orthogonal but equally concrete improvement areas identified in the v10 assessment: measurable performance overhead in the refresh hot path, a defense-in-depth security gap in Citus dblink escaping, and a cluster of operational polish items (CNPG lifecycle, Docker reproducibility, Prometheus coverage) that individually are small but together compose the operational story for production users. ## Deliverables ### PERF-10-01 — SPI Batching in Differential Refresh Entry Point The function `execute_differential_refresh()` in `src/refresh/merge/mod.rs` currently makes 3–4 separate SPI round-trips before executing the merge: 1. Per-source: `to_regclass()` existence check for each change buffer table 2. Per-source: `COUNT(*)` from each change buffer (delta fraction check) 3. Per-source: `reltuples` from `pg_class` (row estimate for cost model) Consolidate into a single CTE query that returns all three values per source OID in one round-trip: ```sql WITH buf AS ( SELECT o.oid, to_regclass(format('pgtrickle_changes.changes_%s', o.oid)) IS NOT NULL AS exists, COALESCE((SELECT count(*) FROM pgtrickle_changes.changes_ || o.oid::text), 0) AS change_count, COALESCE(c.reltuples, 0) AS est_rows FROM unnest($1::oid[]) AS o(oid) LEFT JOIN pg_class c ON c.oid = o.oid ) SELECT * FROM buf ``` **Expected impact**: Saves 10–15ms per refresh cycle for stream tables with multiple source tables. At 10 refreshes/second this is 100–150ms/second of SPI overhead eliminated. ### PERF-10-02 — CDC Trigger String Building Optimization `build_stmt_trigger_fn_sql()` in `src/cdc.rs` uses per-column `format!()` calls collected into a `Vec` and then joined. For a 50-column table this allocates 50 intermediate strings. Replace with a single `String::with_capacity()` buffer with direct `push_str()` appends. Benchmark before/after with a 100-column table trigger rebuild. ### PERF-10-03 — Watermark Computation Query Consolidation The scheduler tick watermark computation issues several independent queries: `pg_current_wal_lsn()`, `pg_stat_activity` xmin probe, and `pg_prepared_xacts` check. Combine these into a single compound SELECT that returns all values in one round-trip, reducing per-tick overhead by ~2ms at the 100ms minimum scheduler interval. ### SEC-10-01 — Replace Manual dblink Escaping `src/citus.rs` lines 357–364 and 650–682 escape SQL strings for dblink calls using manual `connstr.replace('\'', "''")`. While the inputs are system-controlled (values sourced from `pg_dist_node`), this violates defense-in-depth and is flagged by semgrep rules. Replace with `pg_escape_literal()` via SPI for connection string values and `pg_escape_identifier()` for identifiers. Add a comment documenting why parameterized queries are not available for dblink calls. ### OPS-10-01 — CNPG preStop Lifecycle Hook `cnpg/cluster-production.yaml` has no preStop hook. When a CNPG-managed pod is terminated (rolling upgrade, scale-down, eviction), in-flight refresh workers are killed mid-execution. The stream tables are safe (they reinitialize on next startup) but the full refresh adds latency after the upgrade. Add a preStop lifecycle hook: ```yaml lifecycle: preStop: exec: command: - /bin/sh - -c - psql -U postgres -c "SELECT pgtrickle.drain(timeout_s => 120)" || true ``` The `|| true` ensures the pod terminates even if the database is unavailable. Document this in `docs/RUNBOOK_DRAIN.md` under a new "Kubernetes rolling upgrade" section. ### OPS-10-02 — Prometheus Secondary Metrics Add three metrics that are currently absent from `monitoring/prometheus/pg_trickle_queries.yml`: 1. **`pg_trickle_invalidation_ring_overflows_total`** — Counter incremented in `push_invalidation()` when the ring is full. Surfaced from the existing `INVALIDATION_RING_OVERFLOWS` atomic. Alert threshold: > 0 in a 5-minute window (indicates deployment with >1024 simultaneously-DDL'd STs). 2. **`pg_trickle_dag_cycles_detected_total`** — Counter incremented when `check_for_cycles()` returns `Err(CycleDetected(...))`. This event should never occur in steady state; alerting at any non-zero value is appropriate. 3. **`pg_trickle_template_cache_stale_evictions_total`** — Counter for L0 template cache entries evicted because `defining_query_hash` mismatched. A spike here indicates rapid upstream schema change and may predict GROUP_RESCAN fallback storms. ### OPS-10-03 — Docker Base Image Digest Pinning Pin all Dockerfile base images to exact SHA256 digests: - `Dockerfile.demo`: `postgres:18.3-bookworm@sha256:` - `Dockerfile.ghcr`: same - `tests/Dockerfile.e2e`: same Add a `scripts/update_base_image_digests.sh` script that resolves current digests via `docker manifest inspect` and patches the Dockerfiles. Run this script quarterly or when a PostgreSQL patch release is needed. Document the process in `CONTRIBUTING.md`. ### SCAL-10-01 — Invalidation Ring Limit Documentation and Observability The shared memory invalidation ring has a hard ceiling of 1,024 entries (`INVALIDATION_RING_MAX_CAPACITY`). When more than 1,024 stream tables are simultaneously invalidated (e.g. during bulk DDL), the overflow flag triggers a full DAG rebuild rather than incremental invalidation. Add to `docs/CONFIGURATION.md`: - A note under `pg_trickle.invalidation_ring_capacity` documenting the 1,024-entry ceiling and the overflow behaviour. - Guidance for deployments with 1,000+ stream tables: prefer staged DDL operations or increase `invalidation_ring_capacity`. Wire the new `pg_trickle_invalidation_ring_overflows_total` counter (see OPS-10-02) into the monitoring README with an example alert rule. ### COR-10-01 — Deep Join Chain Threshold Documentation The Part 3 correction term in `src/dvm/operators/join.rs` (lines 436–447) is skipped when the left-child scan count exceeds a GUC threshold (default 5). For join chains deeper than 6 tables this may cause small delta drift over many concurrent-change cycles. Add to `docs/CONFIGURATION.md` under a new "Deep Join Chain Tuning" section: - Description of the Part 3 threshold trade-off (SQL complexity vs. delta correctness at depth). - The GUC name and default value. - Recommendation: leave at default for ≤6-table joins; raise for deep chains with sustained concurrent updates; verify with G17-SOAK. ## Testing - `just test-unit` — passes - `just test-integration` — passes - `just test-light-e2e` — passes; differential refresh latency benchmarks show improvement for multi-source STs - `just bench` — Criterion reports no regression; merge codegen bench shows improvement for wide tables - CNPG drain hook: verified by extending `tests/e2e_drain_mode_tests.rs` with a simulated SIGTERM scenario