> **Plain-language companion:** [v0.25.0.md](v0.25.0.md)

## v0.25.0 — Scheduler Scalability & Pooler Performance

**Status: ✅ Released.** Sourced from [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §5, §7.

> **Release Theme**
> This release pushes the comfortable operating point from "hundreds" to
> **thousands** of stream tables on commodity hardware. The scheduler stops
> reloading the full catalog on every tick, the template cache becomes
> shared across all backends via shmem, change detection is batched, and
> the DAG rebuild path uses copy-on-write to avoid blocking dispatch.
> Connection-pooler deployments (PgBouncer, RDS Proxy, Supabase) see the
> biggest win: the shared L0 cache eliminates the 30–45 ms cold-start tax
> per backend. The predictive cost model gets robustness guards, and
> downstream publications gain subscriber-lag tracking.

### Catalog & Scheduler Scalability

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| SCAL-1 | **Shmem catalog snapshot cache.** Cache `pgt_stream_tables` rows in shared memory, keyed by DAG generation counter. Invalidated on DDL via `DAG_REBUILD_SIGNAL`. Eliminates per-tick SPI reload (20–200 ms win at 100–1000 STs). | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §5 |
| SCAL-2 | **Batched change detection.** Combine per-source `SELECT EXISTS(...)` queries into a single `UNION ALL` CTE per refresh group. ~80% reduction in per-tick change-detection cost. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |
| SCAL-3 | **Split PGS_STATE lock.** Replace the single `PgLwLock` in `src/shmem.rs` with per-concern locks (`dag_lock`, `metrics_lock`, `worker_pool_lock`). Use `share()` for read-only `dag_version` reads. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |
| SCAL-4 | **Copy-on-write DAG rebuild.** Compute the new topological order out-of-line (no exclusive lock), then atomically swap the pointer. Defers full rebuild to idle ticks when possible. | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |
| SCAL-5 | **Persistent worker pool option.** New `pg_trickle.worker_pool_size` GUC (default 0 = current spawn-per-task). Workers loop on a shmem queue instead of being registered and deregistered each tick (~2 ms/worker saved). | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |

### Template Cache & Pooler Latency

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| CACHE-1 | **Shared shmem L0 template cache.** `dshash`-based cache in shared memory keyed by `(pgt_id, cache_generation)`. All backends in the same database share one compiled template set. Eliminates 30–45 ms cold-start tax in pooled-connection workloads. | 5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4, §7 |
| CACHE-2 | **L1 LRU eviction.** Bound the per-backend thread-local cache with `pg_trickle.template_cache_max_entries` GUC (default 256). Evict least-recently-used entries. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |
| CACHE-3 | **`pgtrickle.clear_caches()` SQL function.** Manual cache flush for all levels (L0 shmem + L1 thread-local + L2 catalog). Useful during debugging and emergency migration. | 0.5d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |

### Hot-Path Allocation Reduction

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| PERF-1 | **xxh3 streaming hash.** Replace `pg_trickle_hash_multi` string-concat + scalar xxhash with `xxh3` streaming API (`update`/`finalize`). Eliminates per-row `String` allocation on the CDC hot path. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |
| PERF-2 | **Pre-sized SQL buffer in project operator.** Replace per-column `format!` calls in `src/dvm/operators/project.rs` with a single pre-sized `String` and `write!` macro. | 1d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |
| PERF-3 | **Shmem adaptive cost-model state.** Cache `last_full_ms`/`last_diff_ms` per ST in shared memory with atomic updates. Prevents parallel workers from reading stale timing data via SPI. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §5 |

### Predictive Model & Publication Durability

| Item | Description | Effort | Ref |
|------|-------------|--------|-----|
| PRED-1 | **Robustness guards on predictive cost model.** Clamp predictions to `[0.5×, 4×] last_full_ms`; use median+MAD instead of mean+SD; require non-degenerate variance; ignore predictions during first 60 s after CREATE. | 3d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |
| PUB-1 | **Subscriber-LSN tracking for downstream publications.** Track subscriber LSN per publication; refuse to TRUNCATE change buffer until all subscribers have acknowledged past the buffer's max LSN; emit WARNING when a subscriber lags more than `pg_trickle.publication_lag_warn_lsn`. | 4d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |
| PUB-2 | **Multi-DB worker fairness.** Add `pgtrickle.worker_allocation_status()` monitoring view (per-DB used/quota/queued). Document recommended quota allocation in `docs/SCALING.md`. | 2d | [PLAN_OVERALL_ASSESSMENT_2.md](plans/PLAN_OVERALL_ASSESSMENT_2.md) §4 |

### Implementation Phases

| Phase | Description | Duration |
|-------|-------------|----------|
| Phase 1 | Catalog & scheduler scalability: shmem cache, batched detection, lock split | Days 1–13 |
| Phase 2 | Template cache: L0 dshash, L1 LRU, clear_caches() | Days 13–21 |
| Phase 3 | Hot-path: xxh3 hash, project buffer, shmem cost-model | Days 21–27 |
| Phase 4 | Predictive model guards + publication durability + worker fairness | Days 27–36 |
| Phase 5 | Benchmarks, documentation, upgrade script, integration testing | Days 36–42 |

> **v0.25.0 total: ~8–9 weeks** (~42 person-days solo)

**Exit criteria:**
- [x] SCAL-1: Scheduler tick at 1000 STs completes in < 20 ms (down from ~200 ms)
- [x] SCAL-2: Change detection for 10-source ST issues 1 query instead of 10
- [x] SCAL-3: PGS_STATE replaced by 3 per-concern locks; read-only paths use `share()`
- [x] SCAL-4: DAG rebuild does not hold exclusive lock during computation; swap is atomic
- [x] SCAL-5: `worker_pool_size = 4` starts persistent workers; spawn cost eliminated
- [x] CACHE-1: Second backend connecting to same DB hits L0 cache; no parse/differentiate cost
- [x] CACHE-2: L1 cache respects `template_cache_max_entries`; evicts LRU on overflow
- [x] CACHE-3: `pgtrickle.clear_caches()` flushes all three levels; next refresh re-populates
- [x] PERF-1: `pg_trickle_hash_multi` allocates zero intermediate Strings per row
- [x] PERF-2: Project operator uses single pre-sized buffer; 50-column ST shows measurable improvement
- [x] PERF-3: Parallel workers read cost-model state from shmem, not SPI
- [x] PRED-1: Sawtooth workload test: model recovers within 5 samples after outlier spike
- [x] PUB-1: Publication with lagged subscriber emits WARNING; change buffer not truncated until ack
- [x] PUB-2: `worker_allocation_status()` returns per-DB used/quota/queued
- [x] Benchmark regression gate passes (no regressions vs v0.24.0 baseline)
- [x] Extension upgrade path tested (`0.24.0 → 0.25.0`)
- [x] `just check-version-sync` passes


---