# v0.84.0 — Vectorized Compute & Adaptive Engine > **Status:** Planned > **Scope:** Large > **Driven by:** [Assessment 16](../plans/PLAN_OVERALL_ASSESSMENT_16.md) — MT-8, LT-7, LT-9, PERF-O4 ## Theme Introduce vectorized batch processing for high-throughput stream tables, proper incremental algorithms for window functions (replacing partition-based recomputation), and cost-based operator scheduling within delta queries. This release targets a 5–10× throughput improvement for aggregate-heavy and window-function-heavy workloads. ## Items ### MT-8: Vectorized Aggregate Path For pure-aggregate STs (GROUP BY with no joins), process deltas using Arrow-compatible columnar batches: 1. Read change buffer rows into columnar batches (1024-row pages) 2. Vectorized hash aggregation: group-by keys → aggregate accumulators 3. SIMD-accelerated SUM/COUNT/MIN/MAX operations on batch columns 4. Emit only changed groups as delta output 5. Apply via batched MERGE (INSERT/UPDATE per group) Benefits: - 10× throughput for SUM/COUNT/AVG aggregates (SIMD auto-vectorization) - Cache-friendly memory access patterns (column-at-a-time processing) - Reduced SPI overhead (bulk operations instead of per-row) Implementation: - New `VectorizedAggregateOperator` in `src/dvm/operators/vectorized_agg.rs` - Uses `arrow-array` and `arrow-compute` crates for batch processing - Activated when OpTree is a single Aggregate node over a Scan (no joins) - Falls back to standard SQL-based path for complex trees ### LT-7: Incremental Window Function Computation Replace the current partition-based recomputation strategy for window functions with proper incremental algorithms: | Function | Current Strategy | New Strategy | |----------|-----------------|--------------| | `ROW_NUMBER()` | Recompute partition | Maintain sorted B-tree index; insert/delete at position | | `RANK()` / `DENSE_RANK()` | Recompute partition | Track rank counts per distinct value; adjust on delta | | `LAG(col, N)` / `LEAD(col, N)` | Recompute partition | Ring buffer of N preceding/following values per partition | | `SUM() OVER (...)` | Recompute partition | Running sum with prefix-sum tree for range frames | | `COUNT() OVER (...)` | Recompute partition | Running count (algebraic) | | `FIRST_VALUE()` / `LAST_VALUE()` | Recompute partition | Track min/max by sort key | | `NTH_VALUE()` | Recompute partition | Indexed array per partition | Fallback: Functions with `ROWS BETWEEN` or `RANGE BETWEEN` frames that are not covered by the incremental algorithms fall back to partition recomputation. New metadata in `pgt_stream_tables`: `window_strategy JSONB` storing per-window function the chosen incremental strategy and auxiliary state location. ### LT-9: Cost-Based Operator Scheduling Reorder operators within a delta query's OpTree based on estimated cardinality to minimize intermediate result sizes: 1. **Selectivity estimation:** Use PostgreSQL's `pg_statistic` to estimate filter selectivity and join cardinality 2. **Operator reordering rules:** - Push high-selectivity filters before joins (existing P2-7, extended) - Order join inputs by ascending estimated cardinality (smaller relation as build side) - Place DISTINCT before expensive projections 3. **Cost model:** Estimate total intermediate tuples for each valid ordering; choose minimum-cost plan 4. **Safety:** Only reorder when semantically equivalent (respects NULL handling, outer join placement constraints) New diagnostic: `EXPLAIN (FORMAT JSON) SELECT pgtrickle.explain_delta_plan(pgt_id)` shows the estimated cost and chosen operator order. ### PERF-O4: Parallel Delta Computation Fan-Out For joins with multiple independent source tables, compute each source's delta contribution in parallel: ``` Delta Query / | \ Source A Source B Source C delta delta delta \ | / \ | / Join Merge | MERGE Apply ``` Implementation: - Identify independent scan branches in the OpTree (no cross-dependencies) - Spawn parallel SPI connections (one per branch, up to `parallel_fan_out_max`) - Each branch computes its partial delta independently - Results are joined/merged in the coordinator thread - Activated when: (a) >1 source table changed, (b) ST is not in a diamond consistency group, (c) `pg_trickle.parallel_delta_fan_out = true` New GUC: `pg_trickle.parallel_delta_fan_out = false` (opt-in, requires testing)