> **Plain-language companion:** [v0.25.0.md](v0.25.0.md) ## v0.25.0 — Scheduler Scalability & Pooler Performance **Status: ✅ Released.** Sourced from [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §5, §7. > **Release Theme** > This release pushes the comfortable operating point from "hundreds" to > **thousands** of stream tables on commodity hardware. The scheduler stops > reloading the full catalog on every tick, the template cache becomes > shared across all backends via shmem, change detection is batched, and > the DAG rebuild path uses copy-on-write to avoid blocking dispatch. > Connection-pooler deployments (PgBouncer, RDS Proxy, Supabase) see the > biggest win: the shared L0 cache eliminates the 30–45 ms cold-start tax > per backend. The predictive cost model gets robustness guards, and > downstream publications gain subscriber-lag tracking. ### Catalog & Scheduler Scalability | Item | Description | Effort | Ref | |------|-------------|--------|-----| | SCAL-1 | **Shmem catalog snapshot cache.** Cache `pgt_stream_tables` rows in shared memory, keyed by DAG generation counter. Invalidated on DDL via `DAG_REBUILD_SIGNAL`. Eliminates per-tick SPI reload (20–200 ms win at 100–1000 STs). | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §5 | | SCAL-2 | **Batched change detection.** Combine per-source `SELECT EXISTS(...)` queries into a single `UNION ALL` CTE per refresh group. ~80% reduction in per-tick change-detection cost. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | | SCAL-3 | **Split PGS_STATE lock.** Replace the single `PgLwLock` in `src/shmem.rs` with per-concern locks (`dag_lock`, `metrics_lock`, `worker_pool_lock`). Use `share()` for read-only `dag_version` reads. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | | SCAL-4 | **Copy-on-write DAG rebuild.** Compute the new topological order out-of-line (no exclusive lock), then atomically swap the pointer. Defers full rebuild to idle ticks when possible. | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | SCAL-5 | **Persistent worker pool option.** New `pg_trickle.worker_pool_size` GUC (default 0 = current spawn-per-task). Workers loop on a shmem queue instead of being registered and deregistered each tick (~2 ms/worker saved). | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | ### Template Cache & Pooler Latency | Item | Description | Effort | Ref | |------|-------------|--------|-----| | CACHE-1 | **Shared shmem L0 template cache.** `dshash`-based cache in shared memory keyed by `(pgt_id, cache_generation)`. All backends in the same database share one compiled template set. Eliminates 30–45 ms cold-start tax in pooled-connection workloads. | 5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §7 | | CACHE-2 | **L1 LRU eviction.** Bound the per-backend thread-local cache with `pg_trickle.template_cache_max_entries` GUC (default 256). Evict least-recently-used entries. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | CACHE-3 | **`pgtrickle.clear_caches()` SQL function.** Manual cache flush for all levels (L0 shmem + L1 thread-local + L2 catalog). Useful during debugging and emergency migration. | 0.5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | ### Hot-Path Allocation Reduction | Item | Description | Effort | Ref | |------|-------------|--------|-----| | PERF-1 | **xxh3 streaming hash.** Replace `pg_trickle_hash_multi` string-concat + scalar xxhash with `xxh3` streaming API (`update`/`finalize`). Eliminates per-row `String` allocation on the CDC hot path. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | | PERF-2 | **Pre-sized SQL buffer in project operator.** Replace per-column `format!` calls in `src/dvm/operators/project.rs` with a single pre-sized `String` and `write!` macro. | 1d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | | PERF-3 | **Shmem adaptive cost-model state.** Cache `last_full_ms`/`last_diff_ms` per ST in shared memory with atomic updates. Prevents parallel workers from reading stale timing data via SPI. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 | ### Predictive Model & Publication Durability | Item | Description | Effort | Ref | |------|-------------|--------|-----| | PRED-1 | **Robustness guards on predictive cost model.** Clamp predictions to `[0.5×, 4×] last_full_ms`; use median+MAD instead of mean+SD; require non-degenerate variance; ignore predictions during first 60 s after CREATE. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | PUB-1 | **Subscriber-LSN tracking for downstream publications.** Track subscriber LSN per publication; refuse to TRUNCATE change buffer until all subscribers have acknowledged past the buffer's max LSN; emit WARNING when a subscriber lags more than `pg_trickle.publication_lag_warn_lsn`. | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | | PUB-2 | **Multi-DB worker fairness.** Add `pgtrickle.worker_allocation_status()` monitoring view (per-DB used/quota/queued). Document recommended quota allocation in `docs/SCALING.md`. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 | ### Implementation Phases | Phase | Description | Duration | |-------|-------------|----------| | Phase 1 | Catalog & scheduler scalability: shmem cache, batched detection, lock split | Days 1–13 | | Phase 2 | Template cache: L0 dshash, L1 LRU, clear_caches() | Days 13–21 | | Phase 3 | Hot-path: xxh3 hash, project buffer, shmem cost-model | Days 21–27 | | Phase 4 | Predictive model guards + publication durability + worker fairness | Days 27–36 | | Phase 5 | Benchmarks, documentation, upgrade script, integration testing | Days 36–42 | > **v0.25.0 total: ~8–9 weeks** (~42 person-days solo) **Exit criteria:** - [x] SCAL-1: Scheduler tick at 1000 STs completes in < 20 ms (down from ~200 ms) - [x] SCAL-2: Change detection for 10-source ST issues 1 query instead of 10 - [x] SCAL-3: PGS_STATE replaced by 3 per-concern locks; read-only paths use `share()` - [x] SCAL-4: DAG rebuild does not hold exclusive lock during computation; swap is atomic - [x] SCAL-5: `worker_pool_size = 4` starts persistent workers; spawn cost eliminated - [x] CACHE-1: Second backend connecting to same DB hits L0 cache; no parse/differentiate cost - [x] CACHE-2: L1 cache respects `template_cache_max_entries`; evicts LRU on overflow - [x] CACHE-3: `pgtrickle.clear_caches()` flushes all three levels; next refresh re-populates - [x] PERF-1: `pg_trickle_hash_multi` allocates zero intermediate Strings per row - [x] PERF-2: Project operator uses single pre-sized buffer; 50-column ST shows measurable improvement - [x] PERF-3: Parallel workers read cost-model state from shmem, not SPI - [x] PRED-1: Sawtooth workload test: model recovers within 5 samples after outlier spike - [x] PUB-1: Publication with lagged subscriber emits WARNING; change buffer not truncated until ack - [x] PUB-2: `worker_allocation_status()` returns per-DB used/quota/queued - [x] Benchmark regression gate passes (no regressions vs v0.24.0 baseline) - [x] Extension upgrade path tested (`0.24.0 → 0.25.0`) - [x] `just check-version-sync` passes ---