# v0.63.0 — Fused Multi-Node Refresh (Full Details) > **Summary:** [v0.63.0.md](v0.63.0.md) --- ## Motivation After v0.62.0's change-buffer fan-out eliminates redundant buffer scans, the next bottleneck is the per-node statement boundary. For a DAG where nodes B and C both depend on source A, a typical refresh tick today looks like: ```sql -- statement 1 (separate round-trip + planner invocation) WITH a_delta AS (SELECT ... FROM pgtrickle_changes.changes_12345 ...) MERGE INTO public.order_totals USING a_delta ...; -- statement 2 (separate round-trip + planner invocation) WITH a_delta AS (SELECT ... FROM pgtrickle_changes.changes_12345 ...), b_delta AS (SELECT ... FROM public.order_totals ...) MERGE INTO public.customer_summary USING b_delta ...; ``` Even with fan-out, these are two separate statements: two planner invocations, two executor setups, two transaction commits (or two sub-transaction boundaries). Fused refresh collapses them: ```sql -- one statement, one planner invocation, one commit WITH a_delta AS (SELECT ... FROM pgtrickle_changes.changes_12345 ...), order_totals_delta AS (/* full delta CTE for order_totals, reading a_delta */), customer_summary_delta AS (/* full delta CTE for customer_summary */), _apply_order_totals AS ( MERGE INTO public.order_totals USING order_totals_delta ... RETURNING 1 ) MERGE INTO public.customer_summary USING customer_summary_delta ...; ``` The Postgres query planner now has visibility across all nodes in the batch and can share sub-expressions, re-order joins, and avoid redundant sorts. --- ## Implementation ### Fusion Eligibility Not all nodes can be fused in a given tick. A node is fusion-eligible if: 1. Its refresh mode is `DIFFERENTIAL` (FULL-refresh nodes are always executed as separate statements — they do not benefit from CTE fusion and the large result sets make a single mega-statement undesirable). 2. Its direct upstream nodes are either base tables (sources) or other nodes that are also fusion-eligible in the same tick. 3. Its estimated delta size (from the cost model) is below `pg_trickle.fused_refresh_max_delta_rows` (default: 500 000). Very large deltas produce very large CTEs; the planner may choose a worse plan for the composed statement than for two independent statements. 4. The node is not currently paused (via `pgtrickle.pause_scheduler()`). The scheduler computes the maximal fusion-eligible subgraph for each tick using a topological scan of the ready set. Nodes that fall outside the eligible subgraph are refreshed in the standard sequential mode. ### SQL Generation The existing `differentiate()` function in `src/refresh/diff.rs` produces a `DiffOutput { cte_sql: String, apply_sql: String }` for a single node. v0.63.0 adds a new `fuse_diff_batch(nodes: &[NodeSpec], ctx: &DiffContext) -> FusedOutput` function in `src/refresh/fused.rs` that: 1. Calls `differentiate()` for each node in topological order. 2. Merges the CTE lists, deduplicating shared source-delta CTEs by name. 3. Composes the apply statements into a single `WITH … MERGE; MERGE; …` chain using wrapping CTEs (`_apply_`) for all but the final node. 4. Returns a `FusedOutput { sql: String, node_count: usize }`. The merged CTE list uses the existing `cte_counter` namespace (`__pgt_cte_scan_1`, `__pgt_cte_scan_2`, …) with a per-batch offset to prevent name collisions across nodes. ### Correctness The correctness guarantee is unchanged: each node's delta is computed from the same snapshot as the sequential case. The CTE `MATERIALIZED` hint is applied to shared delta CTEs to prevent the planner from inlining them multiple times (which would re-evaluate non-deterministic expressions). **Property test extension:** `tests/property/fused_vs_sequential.rs` runs 50 random Δ batches against a 22-node TPC-H DAG, applies them first with fused refresh and then with sequential refresh against a fresh replica, and asserts that every stream table's row multiset is identical. This test runs as part of `just test-integration` (Testcontainers, no custom image). ### Benchmark Gate ``` cargo bench --bench refresh_bench -- tpch_differential_22 ``` Gate: ≥ 20 % wall-time reduction vs. v0.62.0 at Δ = 10 K rows. The benchmark is added to the CI regression check (`scripts/criterion_regression_check.py`) with a 15 % allowance. --- ## New GUCs | GUC | Type | Default | Description | |-----|------|---------|-------------| | `pg_trickle.enable_fused_refresh` | bool | `true` | Enable CTE-fused multi-node refresh | | `pg_trickle.fused_refresh_max_delta_rows` | int | `500000` | Max estimated delta rows for a node to be fusion-eligible | --- ## Test Coverage | Test | Location | What it covers | |------|----------|----------------| | `test_fused_refresh_tpch_22` | `tests/e2e_refresh_tests.rs` | End-to-end fused refresh of all 22 TPC-H nodes | | `test_fused_vs_sequential_parity` | `tests/property/fused_vs_sequential.rs` | Byte-identical output for 50 random Δ batches | | `test_fused_refresh_guc_disable` | `tests/e2e_refresh_tests.rs` | `enable_fused_refresh=false` falls back to sequential | | `test_fused_refresh_full_node_excluded` | `tests/e2e_refresh_tests.rs` | FULL-mode nodes are never fused | | `test_fused_refresh_large_delta_excluded` | `tests/e2e_refresh_tests.rs` | Nodes above `fused_refresh_max_delta_rows` are excluded | | `fuse_diff_batch_cte_dedup` | `src/refresh/fused.rs` (#[cfg(test)]) | Shared source-delta CTEs are deduplicated | | `fuse_diff_batch_name_collision_free` | `src/refresh/fused.rs` (#[cfg(test)]) | CTE names are unique across all nodes in a batch | --- ## Upgrade Notes No breaking changes. Fused refresh is enabled by default. Set `pg_trickle.enable_fused_refresh = false` in `postgresql.conf` or via `ALTER SYSTEM SET` to revert to sequential mode if a specific DAG shape causes unexpected planner behaviour. Please file an issue with the problematic query if you need to disable fusion — we want to understand and fix planner interaction problems rather than leave them as permanent opt-outs.