# PLAN: Edge Cases — Catalogue, Workarounds & Prioritised Remediation **Status:** Proposed **Target milestone:** v0.3.0+ **Last updated:** 2026-06-05 --- ## Motivation pg_trickle's documentation (FAQ, SQL Reference, Architecture) describes 36 discrete edge cases and limitations. Some are fundamental trade-offs, others are fixable engineering gaps. This document: 1. Catalogues every known edge case in one place. 2. Proposes concrete workarounds or fixes for each. 3. Assigns a priority tier so the team can focus effort where it matters most. --- ## Priority Tiers | Tier | Meaning | Guidance | |------|---------|----------| | **P0** | Data correctness risk — users can silently get wrong results | Fix before v1.0 | | **P1** | Operational surprise — confusing failure, silent non-behaviour | Fix before v1.0 | | **P2** | Usability gap — documented but painful; strong workaround exists | Schedule for a minor release | | **P3** | Accepted trade-off — inherent to the design; document clearly | Keep documentation up to date | --- ## Edge Case Catalogue ### EC-01 — JOIN key change + simultaneous right-side DELETE | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — JOIN delta | | **Tier** | **P0** | | **Impact** | Stale row remains in stream table until next full refresh | | **Current mitigation** | Adaptive FULL fallback (`pg_trickle.adaptive_full_threshold`) | | **Documented in** | FAQ § "Known edge cases"; SQL_REFERENCE § "Known Delta Computation Limitations" | | **Status** | ✅ **IMPLEMENTED** — Part 1 split (R₀ via EXCEPT ALL) in `diff_inner_join`, `diff_left_join`, `diff_full_join` | **Root cause:** The delta query reads `current_right` after all changes are applied. When the old join partner is deleted before the delta runs, the DELETE half of the join finds no partner and is silently dropped. **Proposed fix (chosen approach: R₀ via EXCEPT ALL — medium-term):** The short-term threshold adjustment is **not viable**: it trades query correctness for a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold. **Primary fix — R₀ via EXCEPT ALL (4–6 days):** The delta formula in `src/dvm/operators/join.rs` is: ``` ΔJ = (ΔQ ⋈ R₁) + (L₀ ⋈ ΔR) [current code] ``` Part 2 already computes `L₀` (pre-change left state) via EXCEPT ALL for Scan children (`left_part2_source` in `diff_inner_join`). The fix mirrors this symmetrically for Part 1's right side — splitting Part 1 into: ``` Part 1a: ΔQ_inserts ⋈ R₁ (post-change right — unchanged) Part 1b: ΔQ_deletes ⋈ R₀ (pre-change right via EXCEPT ALL) ``` `R₀` is computed identically to the existing `L₀`: ```sql R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes ``` `build_snapshot_sql()` already exists in `join_common.rs` and is reused here. The correction term (Part 3) was added for the `L₁` case in Part 2 and is unaffected — Part 2 stays unchanged. **Implementation plan:** - Day 1–2: Add `right_part1_source` decision + EXCEPT ALL CTE analogous to `use_l0` / `left_part2_source`. Mirror `use_r0` logic for nested right children (no SemiJoin, ≤ 2 scan nodes → R₀; otherwise R₁). - Day 2–3: Split Part 1 SQL into two UNION ALL arms; update row ID hashing for Part 1b arm. - Day 4–5: Integration tests — co-delete scenario, UPDATE-then-delete, multi-cycle correctness. - Day 5–6: TPC-H regression suite (Q07 is the critical path). **Deferred — pre-image capture from change buffer:** The long-term approach (using `old_col` values already stored in the change buffer trigger to reconstruct R₀ from ΔR directly, skipping the live table scan) is deferred. The `old_col` values **are** already captured by the CDC trigger (`OLD."col"` → `"old_col"` for UPDATE/DELETE), but surfacing them through `DiffContext`/`DeltaSource` requires structural changes to the delta CTE column representation that carry higher risk. Estimated additional effort: 7–10 days. Schedule for v1.x when the EXCEPT ALL approach is already correct and benchmarked. --- ### EC-02 — CUBE/ROLLUP expansion limit (> 64 branches) | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — grouping sets | | **Tier** | **P3** | | **Impact** | Creation error | | **Current mitigation** | Use explicit `GROUPING SETS(…)` | | **Documented in** | SQL_REFERENCE § "CUBE/ROLLUP Expansion Limit" | | **Status** | ✅ **Done** — `pg_trickle.max_grouping_set_branches` GUC (default 64, range 1–65536) | **Proposed fix:** ~~The 64-branch limit is a sensible guard. Keep it but make the limit configurable via a GUC~~ **Implemented:** `PGS_MAX_GROUPING_SET_BRANCHES` in `src/config.rs`; parser.rs reads GUC dynamically instead of const. Documented in CONFIGURATION.md. --- ### EC-03 — Window functions inside expressions | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — window functions | | **Tier** | **P2** | | **Impact** | Creation error | | **Current mitigation** | Keep window functions as top-level SELECT columns | | **Documented in** | FAQ § "Window Functions" | | **Status** | ✅ **IMPLEMENTED** — `rewrite_nested_window_exprs()` subquery-lift in `src/dvm/parser.rs`; helper `collect_all_window_func_nodes()` for recursive window func harvesting; `deparse_select_window_clause()` for WINDOW clause passthrough; exported from `src/dvm/mod.rs`; wired in `src/api.rs` before DISTINCT ON rewrite | **Implementation: nested-subquery lift (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 1.3):** Window functions nested inside expressions are lifted into a synthetic column in an inner subquery, then the outer SELECT applies the original expression with the window function replaced by the synthetic column alias: ```sql -- Before (rejected): SELECT id, ABS(ROW_NUMBER() OVER (ORDER BY score) - 5) AS adjusted_rank FROM players -- After rewrite (supported): SELECT id, ABS("__pgt_wf_inner"."__pgt_wf_1" - 5) AS "adjusted_rank" FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY score) AS "__pgt_wf_1" FROM players ) "__pgt_wf_inner" ``` The subquery approach is preferred over the earlier CTE approach because a `WITH` clause produces a `WithQuery` node in the OpTree, requiring the DVM engine to handle CTEs wrapping window functions — an untested path. The subquery produces a nested `Scan → Window → Project` chain that the existing DVM path already handles correctly. **Test coverage:** 9 E2E tests (`e2e_window_tests.rs`): - 6 creation-acceptance tests: `test_window_in_case_expression_rejected`, `test_window_in_coalesce_rejected`, `test_window_in_arithmetic_rejected`, `test_window_in_cast_rejected`, `test_window_deeply_nested_rejected`, `test_top_level_window_still_works` (regression). - 3 data-correctness tests: `test_ec03_case_window_data_correctness` — CASE + ROW_NUMBER() produces correct tier labels; `test_ec03_arithmetic_window_data_correctness` — ROW_NUMBER() * 10 verified with a FULL refresh after INSERT; `test_ec03_coalesce_window_data_correctness` — COALESCE(SUM() OVER, 0) produces correct partition sums. **Known limitation:** ~~Window-in-expression queries are accepted in DIFFERENTIAL mode (creation succeeds), but differential *refresh* fails with a `column st.* does not exist` error.~~ **Fixed (EC-03):** AUTO mode now correctly falls back to FULL refresh with an informational message. Explicit DIFFERENTIAL mode emits a `WARNING` at creation time alerting the user that the pattern will fall back to full recomputation at refresh time. The DVM delta engine cannot currently resolve the `st.*` alias inside the inner subquery introduced by the rewrite. FULL refresh works correctly. Differential support for the rewritten form is deferred. --- ### EC-04 — Non-monotone recursive CTEs rejected | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — recursive CTEs | | **Tier** | **P3** | | **Impact** | Creation error | | **Current mitigation** | Restructure as non-recursive CTE; or use FULL mode | | **Documented in** | FAQ § "Recursive CTEs" | **Proposed fix:** This is a fundamental limitation of semi-naive evaluation — non-monotone operators in the recursive term can "un-derive" rows, breaking the incremental fixpoint. **No code fix proposed.** Keep the clear error message and document the reasoning. Users who need non-monotone recursion should use FULL mode. --- ### EC-05 — Foreign tables in DIFFERENTIAL mode | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — foreign data wrappers | | **Tier** | **P2** | | **Impact** | Creation error | | **Current mitigation** | Use FULL mode | | **Documented in** | SQL_REFERENCE § "Source Tables" | | **Status** | ✅ **IMPLEMENTED** — error message improved + polling CDC implemented | **Proposed fix:** 1. ✅ **Short term (Done):** Improved the error message to suggest `FULL` mode explicitly and mention the `postgres_fdw` + `IMPORT FOREIGN SCHEMA` pattern. 2. ✅ **Medium term (Done):** Polling-based CDC for foreign tables. Enabled via `pg_trickle.foreign_table_polling = on` GUC. Creates a snapshot table per foreign table source; before each differential refresh, computes EXCEPT ALL deltas and populates the standard change buffer. Snapshot is refreshed each cycle. Cleanup path drops snapshot tables when STs are removed. --- ### EC-06 — Keyless tables with exact duplicate rows | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — row identity | | **Tier** | **P0** | | **Impact** | Phantom or missed deletes in stream table | | **Current mitigation** | Add a primary key; add a synthetic unique column; or use FULL mode | | **Documented in** | FAQ § "Keyless Tables"; SQL_REFERENCE § "Row Identity" | | **Status** | ✅ **IMPLEMENTED** — Full EC-06 fix: catalog `has_keyless_source` flag, non-unique index for keyless sources, net-counting delta SQL in scan.rs (decompose changes into atomic +1/-1 per content hash, expand via `generate_series`), counted DELETE (ROW_NUMBER matching) in refresh.rs and ivm.rs, plain INSERT (no ON CONFLICT) for keyless sources. WARNING at creation time also implemented. | **Implementation (completed):** 1. ~~**Short term (P0):** Emit a `WARNING` at `create_stream_table()` time~~ ✅ Done 2. **Full fix (P0):** ✅ Implemented via net-counting approach: - **Catalog:** Added `has_keyless_source BOOLEAN NOT NULL DEFAULT FALSE` to `pgtrickle.pgt_stream_tables` (set at creation time). - **Index:** Non-unique `CREATE INDEX` for keyless sources (instead of `CREATE UNIQUE INDEX`), allowing duplicate `__pgt_row_id` values. - **Delta SQL (scan.rs):** For keyless tables, decomposes all change events into atomic `+1` (INSERT) / `-1` (DELETE) operations per content hash. UPDATEs are split into their DELETE-old + INSERT-new components. Net counts are computed via `GROUP BY content_hash` + `SUM(delta_sign)`, then expanded to the correct number of D/I rows using `generate_series`. - **Apply (refresh.rs):** For keyless sources, forces explicit DML path (cannot use MERGE with non-unique row_ids). Counted DELETE uses `ROW_NUMBER` matching to pair delta deletes 1:1 with stream table rows. Plain INSERT without NOT EXISTS / ON CONFLICT. - **Apply (ivm.rs):** Same counted DELETE + plain INSERT for IMMEDIATE mode. - **Upgrade SQL:** `ALTER TABLE ... ADD COLUMN IF NOT EXISTS has_keyless_source` **Bug fixes (post-implementation):** - **Guard trigger DELETE cancel:** The DML guard trigger (EC-26) used `RETURN NEW` which is NULL for DELETE operations, silently cancelling managed-refresh DELETEs on the stream table. Fixed to use `IF TG_OP = 'DELETE' THEN RETURN OLD; ELSE RETURN NEW; END IF;`. - **Keyless aggregate UPDATE no-op:** The keyless UPDATE template was a no-op (`SELECT 1 WHERE false`), which is correct for keyless scans but wrong for aggregate queries on keyless sources (where `'I'`-action rows update existing groups). Fixed to use the normal UPDATE template, which naturally matches 0 rows for scan-level keyless deltas. **Test coverage:** 7 E2E tests (`e2e_keyless_duplicate_tests.rs`) — all passing. Covers: duplicate rows basic, delete-one-of-duplicates, update-one-of-duplicates, unique content multi-cycle DML, delete-all- then-reinsert, mixed DML stress, aggregate with duplicates. --- ### EC-07 — IMMEDIATE mode write serialization | Field | Value | |-------|-------| | **Area** | IMMEDIATE mode — concurrency | | **Tier** | **P3** | | **Impact** | Reduced write throughput under high concurrency | | **Current mitigation** | Use DIFFERENTIAL mode for high-throughput OLTP | | **Documented in** | FAQ § "IMMEDIATE Mode Trade-offs" | **Proposed fix:** This is an inherent trade-off of synchronous IVM — you cannot maintain consistency without serialization. The `IvmLockMode` already uses lighter `pg_try_advisory_xact_lock` for simple scan chains. **No code fix proposed.** Document the throughput characteristics in the performance guide and add a benchmark that quantifies the write-side overhead so users can make informed mode choices. --- ### EC-08 — IMMEDIATE mode write amplification | Field | Value | |-------|-------| | **Area** | IMMEDIATE mode — latency | | **Tier** | **P3** | | **Impact** | Increased per-DML latency | | **Current mitigation** | Use DIFFERENTIAL mode | | **Documented in** | FAQ § "IMMEDIATE Mode Trade-offs" | **Proposed fix:** Inherent trade-off. Potential optimisation: batch multiple trigger invocations within a single statement into one delta application (the current implementation already uses statement-level triggers with transition tables, which provides this batching naturally). Add a benchmark to `benches/` that measures per-statement overhead for 1, 10, 100, 1000 affected rows. --- ### EC-09 — IMMEDIATE mode unsupported constructs | Field | Value | |-------|-------| | **Area** | IMMEDIATE mode — SQL support | | **Tier** | **P2** | | **Impact** | Creation / alter error | | **Current mitigation** | Use DIFFERENTIAL mode | | **Documented in** | FAQ § "What SQL features are NOT supported in IMMEDIATE mode?" | | **Status** | ✅ **Done** — recursive CTEs + TopK now supported in IMMEDIATE mode | **Proposed fix (recursive CTEs + TopK): ~~deferred to~~ implemented from PLAN_TRANSACTIONAL_IVM_PART_2.md Phase 5.** Both constructs are now implemented: - **Recursive CTEs (G10 / Task 5.1):** `check_immediate_support()` changed from rejection to warning about stack depth. Semi-naive evaluation proceeds with `DeltaSource::TransitionTable`. - **TopK (G11 / Task 5.2):** `apply_topk_micro_refresh()` in `ivm.rs` implements statement-level micro-refresh. Bounded by `pg_trickle.ivm_topk_max_limit` GUC (default 1000). (default 1000) to prevent inline recomputation latency spikes for large K. For materialised view sources: use a polling-change-detection wrapper (same approach as EC-05 for foreign tables). No code fix proposed for this case. --- ### EC-10 — IMMEDIATE mode: no throttling | Field | Value | |-------|-------| | **Area** | IMMEDIATE mode — back-pressure | | **Tier** | **P3** | | **Impact** | No rate limiting for high-frequency writes | | **Current mitigation** | Use DIFFERENTIAL mode with a schedule interval | | **Documented in** | FAQ § "IMMEDIATE Mode Trade-offs" | **Proposed fix:** By design — throttling IMMEDIATE would break the same-transaction consistency guarantee. **No code fix.** Ensure the FAQ clearly explains that DIFFERENTIAL with a short interval (e.g. `'1s'`) is the right answer when back-pressure is needed. --- ### EC-11 — Schedule interval shorter than refresh duration | Field | Value | |-------|-------| | **Area** | Scheduler — timing | | **Tier** | **P1** | | **Impact** | Perpetual latency spiral; cascading DAG chains fall behind | | **Current mitigation** | Widen the interval; use IMMEDIATE mode; or split the DAG | | **Documented in** | FAQ § "Performance and Tuning" | | **Status** | ✅ **IMPLEMENTED** — `scheduler_falling_behind` NOTIFY alert at 80% threshold | **Test coverage:** `SchedulerFallingBehind` variant covered in `test_alert_event_as_str` and `test_alert_event_all_variants_unique` (monitor.rs unit tests). **Chosen fix: `scheduler_falling_behind` NOTIFY alert (short term):** When the refresh duration exceeds 80% of the schedule interval for 3 consecutive cycles, emit a NOTIFY on the channel `pgtrickle_alerts` with a JSON payload containing the stream table name, average duration, and configured interval. Operators can LISTEN on the channel and wire it into their alerting stack. **Deferred:** - `auto_backoff` GUC (double effective interval when falling behind) — medium term, schedule for a later sprint. - `explain_st(name)` recommendation from `pgt_refresh_history` — long term. --- ### EC-12 — No read-your-writes (DIFFERENTIAL/FULL) | Field | Value | |-------|-------| | **Area** | Scheduler — consistency model | | **Tier** | **P3** | | **Impact** | Stale reads immediately after write | | **Current mitigation** | Use IMMEDIATE mode or call `refresh_stream_table()` manually | | **Documented in** | FAQ § "Consistency" | **Proposed fix:** Inherent to the scheduled refresh model. **No code fix.** Consider a convenience function `pgtrickle.write_and_refresh(dml_sql, st_name)` that executes the DML and triggers a manual refresh in a single call. Low priority — manual `refresh_stream_table()` already covers this. --- ### EC-13 — Diamond dependency split-version reads | Field | Value | |-------|-------| | **Area** | Diamond dependencies — consistency | | **Tier** | **P1** | | **Impact** | Fan-in node reads inconsistent upstream versions | | **Current mitigation** | `diamond_consistency = 'atomic'` | | **Documented in** | FAQ § "Diamond Dependencies"; SQL_REFERENCE | | **Status** | ✅ **FULLY IMPLEMENTED** — default `'atomic'` + scheduler sub-transaction fix | **Proposed fix:** The `atomic` mode already fixes this. Remaining work: 1. **Short term (P1):** Default `diamond_consistency` to `'atomic'` for new stream tables (currently `'none'`). This is a safer default — users who don't need atomicity can opt out. ✅ Done. 2. **Medium term:** Add a `health_check()` warning when a diamond graph exists but `diamond_consistency = 'none'`. **Scheduler bug fixed (follow-on):** When EC-13 changed the default to `'atomic'`, a latent bug surfaced: the scheduler's diamond group path used `Spi::run("SAVEPOINT …")` which PostgreSQL rejects in background workers (`SPI_ERROR_TRANSACTION`). This silently skipped every diamond group refresh. Fixed by replacing the SPI SAVEPOINT calls with `pg_sys::BeginInternalSubTransaction` / `ReleaseCurrentSubTransaction` / `RollbackAndReleaseCurrentSubTransaction`, which bypass SPI. See `src/scheduler.rs`. --- ### EC-14 — Diamond upstream failure skips downstream | Field | Value | |-------|-------| | **Area** | Diamond dependencies — failure handling | | **Tier** | **P3** | | **Impact** | Downstream stays stale for one cycle | | **Current mitigation** | Automatic retry on next cycle | | **Documented in** | FAQ § "Diamond Dependencies" | **Proposed fix:** Current behaviour is correct. Add a NOTIFY `diamond_group_partial_failure` when a group member fails so operators can investigate. **No urgency** — the automatic retry handles recovery. --- ### EC-15 — SELECT * on source table + DDL | Field | Value | |-------|-------| | **Area** | DDL — schema evolution | | **Tier** | **P1** | | **Impact** | Refresh fails; stream table marked `needs_reinit` | | **Current mitigation** | Use explicit column lists in defining queries | | **Documented in** | FAQ § "Schema Changes" | | **Status** | ✅ **IMPLEMENTED** — WARNING at creation time for `SELECT *` | **Test coverage:** 10 unit tests for `detect_select_star()` pure function (api.rs): bare `*`, table-qualified `t.*`, explicit columns, `count(*)`, `sum(*)`, mixed agg+bare star, no asterisk, subquery star, case insensitivity, no-FROM clause. **Proposed fix:** 1. **Short term (P1):** ✅ Done — Emits `WARNING` at `create_stream_table()` time when the defining query contains `SELECT *`. Detection logic extracted to pure `detect_select_star()` function for unit testability. 2. **Medium term:** When `needs_reinit` is triggered by a column-count mismatch, attempt an automatic reinitialisation: re-parse the query, verify the new column set is compatible, and refresh. If compatible, skip the `needs_reinit` flag and refresh directly. --- ### EC-16 — ALTER FUNCTION body change not detected | Field | Value | |-------|-------| | **Area** | DDL — function dependency tracking | | **Tier** | **P1** | | **Impact** | Stream table silently uses new function logic without a full rebase | | **Current mitigation** | Manual full refresh after function changes | | **Documented in** | FAQ § "Schema Changes" | | **Status** | ✅ **IMPLEMENTED** — `function_hashes TEXT` column on `pgt_stream_tables` (migration 0.2.1→0.2.2); `check_proc_hashes_changed()` in `refresh.rs` queries `md5(prosrc \|\| coalesce(probin::text,''))` from `pg_proc` on each diff refresh; on mismatch calls `mark_for_reinitialize` + NOTICE; on first call stores baseline silently | **Implemented fix:** Polling via `pg_proc` on each differential refresh: - On first call (stored hashes = NULL): compute and persist baseline. No reinit. - On following calls: recompute hashes. If any differ → `mark_for_reinitialize` + NOTICE. The next scheduler cycle performs a full refresh. - Uses `md5(string_agg(prosrc || coalesce(probin::text,''), ',' ORDER BY oid))` to handle polymorphic overloads robustly. **Test coverage:** 3 E2E tests (`e2e_multi_cycle_tests.rs`): - `test_ec16_function_body_change_marks_reinit` — verifies that `CREATE OR REPLACE FUNCTION` triggers `needs_reinit = true` on the next differential refresh. - `test_ec16_function_change_full_refresh_recovery` — verifies that after reinit, a full refresh produces correct results with the new function logic. - `test_ec16_no_functions_unaffected` — verifies that stream tables without user-defined functions are never affected by the hash polling mechanism. --- ### EC-17 — Schema change during active refresh | Field | Value | |-------|-------| | **Area** | DDL — concurrency | | **Tier** | **P2** | | **Impact** | Refresh errors; stream table suspended | | **Documented in** | FAQ § "Schema Changes" | | **Status** | ✅ **Done** — FAQ section documents lock interaction | **Proposed fix:** ~~No code fix needed~~ **Implemented:** FAQ section added explaining `ShareLock` on source tables during refresh blocks concurrent `ALTER TABLE` (`AccessExclusiveLock`). The window is only open if the DDL sneaks in between lock acquisition and first read. --- ### EC-18 — `auto` CDC mode silent non-upgrade | Field | Value | |-------|-------| | **Area** | CDC — mode transition | | **Tier** | **P1** | | **Impact** | User thinks WAL CDC is active; actually stuck on triggers forever | | **Current mitigation** | Default is `TRIGGER` mode; `auto` must be explicitly set | | **Documented in** | FAQ § "CDC Architecture" | | **Status** | ✅ **IMPLEMENTED** — rate-limited LOG every ~60 scheduler ticks | **Proposed fix:** 1. **Short term (P1):** When `cdc_mode = 'auto'`, emit a `LOG`-level message on every scheduler cycle that explains *why* the transition hasn't happened (e.g. "wal_level is 'replica', need 'logical'"). Rate- limit to once per 5 minutes. 2. **Medium term:** Add a `check_cdc_health()` finding for "auto mode stuck in TRIGGER phase for > 1 hour". **Test coverage:** 2 E2E tests (`e2e_wal_cdc_tests.rs`) + 1 existing test: - `test_ec18_check_cdc_health_shows_trigger_for_stuck_auto` — verifies `check_cdc_health()` reports TRIGGER mode for a keyless auto-CDC source and no alert fires for healthy TRIGGER-mode sources. - `test_ec18_health_check_ok_with_trigger_auto_sources` — verifies `health_check()` does not flag TRIGGER-mode auto-CDC sources as errors. - `test_wal_keyless_table_stays_on_triggers` (pre-existing) — verifies keyless tables remain on TRIGGER mode in auto-CDC configuration. --- ### EC-19 — WAL mode + keyless tables need REPLICA IDENTITY FULL | Field | Value | |-------|-------| | **Area** | CDC — WAL decoding | | **Tier** | **P0** | | **Impact** | Silent data loss for UPDATEs/DELETEs on keyless tables in WAL mode | | **Current mitigation** | Set `REPLICA IDENTITY FULL` before switching to WAL CDC | | **Documented in** | FAQ § "CDC Architecture" | | **Status** | ✅ **IMPLEMENTED** — `setup_cdc_for_source()` rejects at creation time | **Proposed fix:** 1. **Short term (P0):** At `create_stream_table()` / `alter_stream_table()` time, when `cdc_mode` involves WAL and any source table is keyless, check `pg_class.relreplident` and **reject** with a clear error if not `'f'` (full). This prevents the silent data loss entirely. 2. **Medium term:** Automatically issue `ALTER TABLE … REPLICA IDENTITY FULL` for keyless tables when entering WAL mode (with a NOTICE). **Test coverage:** 2 E2E tests (`e2e_wal_cdc_tests.rs`): - `test_ec19_wal_keyless_without_replica_identity_full_rejected` — verifies that `create_stream_table(cdc_mode => 'wal')` on a keyless table without REPLICA IDENTITY FULL is rejected with a clear error. - `test_ec19_wal_keyless_with_replica_identity_full_accepted` — verifies that the same combination succeeds once REPLICA IDENTITY FULL is set. --- ### EC-20 — CDC TRANSITIONING phase complexity | Field | Value | |-------|-------| | **Area** | CDC — mode transition | | **Tier** | **P2** | | **Impact** | Potential duplicate capture during transition; rollback to triggers on failure | | **Current mitigation** | LSN-based deduplication; automatic rollback | | **Documented in** | FAQ § "CDC Architecture" | | **Status** | ✅ **Done** — `check_cdc_transition_health()` in scheduler.rs | **Proposed fix:** ~~Add a `trigger_inventory()` + `check_cdc_health()` combined check~~ **Implemented:** `check_cdc_transition_health()` runs after `recover_from_crash()` in the scheduler. Loads all stream table deps, filters TRANSITIONING entries, checks `pg_replication_slots` for slot existence, rolls back to TRIGGER mode if slot is missing, logs status. --- ### EC-21 — Stream tables not propagated to logical replication subscriber | Field | Value | |-------|-------| | **Area** | Replication — architecture | | **Tier** | **P3** | | **Impact** | Subscriber gets static snapshot without refresh capability | | **Current mitigation** | Run pg_trickle only on primary | | **Documented in** | FAQ § "Replication" | | **Status** | ✅ **Done** — FAQ documentation sweep completed | **Proposed fix:** Inherent to the architecture — CDC triggers and the scheduler only run on the primary. **No code fix.** Consider a future "subscriber mode" that detects replicated source tables and runs its own CDC + refresh cycle, but this is a major undertaking (v2.0+). --- ### EC-22 — Change buffers not published to subscribers | Field | Value | |-------|-------| | **Area** | Replication — change buffers | | **Tier** | **P3** | | **Impact** | Subscriber cannot drive its own refresh | | **Documented in** | FAQ § "Replication" | | **Status** | ✅ **Done** — FAQ documentation sweep completed | **Proposed fix:** By design. Same as EC-21 — keep documentation clear. --- ### EC-23 — Scheduler does not run on standby / replica | Field | Value | |-------|-------| | **Area** | Replication — standby | | **Tier** | **P3** | | **Impact** | No IVM on read replicas | | **Current mitigation** | Query the primary; or promote the replica | | **Documented in** | FAQ § "HA / Replication" | | **Status** | ✅ **Done** — FAQ documentation sweep completed | **Proposed fix:** By design — standbys are read-only. **No code fix.** A future enhancement could add a read-only health endpoint on standby that reports the last refresh timestamp from the replicated catalog, so monitoring can detect staleness. --- ### EC-24 — TRUNCATE on source table bypasses CDC triggers | Field | Value | |-------|-------| | **Area** | CDC — TRUNCATE handling | | **Tier** | **P3** | | **Impact** | Stream table not updated (until reinit on next cycle) | | **Current mitigation** | TRUNCATE event trigger marks `needs_reinit`; next cycle does full recompute | | **Documented in** | SQL_REFERENCE § "TRUNCATE" | **Proposed fix:** Already handled by the TRUNCATE event trigger. The only gap is latency — between the TRUNCATE and the next cycle, the stream table is stale. For IMMEDIATE mode, TRUNCATE triggers a same-transaction full refresh (already implemented). For DIFFERENTIAL/FULL, the current behaviour is acceptable. **No further code fix needed.** --- ### EC-25 — TRUNCATE on a stream table itself | Field | Value | |-------|-------| | **Area** | Stream table — direct manipulation | | **Tier** | **P1** | | **Impact** | Frontier / buffer desync; future refreshes produce incorrect deltas | | **Current mitigation** | Use `refresh_stream_table()` to reset | | **Documented in** | SQL_REFERENCE § "What Is NOT Allowed" | | **Status** | ✅ **IMPLEMENTED** — BEFORE TRUNCATE guard trigger blocks direct TRUNCATE | **Test coverage:** `test_guard_trigger_blocks_truncate` E2E test (`e2e_guard_trigger_tests.rs`). **Proposed fix:** 1. **Short term (P1):** ✅ Done — BEFORE TRUNCATE trigger installed on storage tables at creation time. Checks `pg_trickle.internal_refresh` session variable to allow managed refreshes. The `internal_refresh` flag is set in all refresh code paths: `execute_manual_refresh`, `execute_manual_full_refresh`, `execute_full_refresh`, `execute_topk_refresh`, and `pgt_ivm_handle_truncate`. 2. **Medium term:** Instead of blocking, intercept TRUNCATE and automatically perform a full refresh (same approach as IMMEDIATE mode TRUNCATE handling). This is friendlier but may surprise users who expect TRUNCATE to be instant. --- ### EC-26 — Direct DML on stream table | Field | Value | |-------|-------| | **Area** | Stream table — direct manipulation | | **Tier** | **P1** | | **Impact** | Data overwritten or duplicated on next refresh | | **Current mitigation** | Documentation; never write directly | | **Documented in** | SQL_REFERENCE § "What Is NOT Allowed" | | **Status** | ✅ **IMPLEMENTED** — BEFORE INSERT/UPDATE/DELETE guard trigger on storage table | **Test coverage:** 4 E2E tests (`e2e_guard_trigger_tests.rs`) — `test_guard_trigger_blocks_direct_insert`, `_update`, `_delete` + `test_guard_trigger_allows_managed_refresh`. **Proposed fix:** 1. **Short term (P1):** ✅ Done — Row-level BEFORE trigger installed on storage tables at creation time. Uses `pg_trickle.internal_refresh` session variable (checked via `current_setting(..., true)`) to allow managed refreshes. Returns OLD for DELETE operations (returning NEW would be NULL, silently cancelling the DELETE). The `internal_refresh` flag is set across all refresh code paths. 2. **Medium term:** Use a PostgreSQL row-level security policy or a `pg_trickle.allow_direct_dml` session variable as an escape hatch for power users. --- ### EC-27 — Foreign keys on stream tables | Field | Value | |-------|-------| | **Area** | Stream table — constraints | | **Tier** | **P3** | | **Impact** | FK violations during bulk MERGE | | **Current mitigation** | Do not define FKs | | **Documented in** | SQL_REFERENCE § "What Is NOT Allowed" | **Proposed fix:** Reject `ALTER TABLE … ADD CONSTRAINT … FOREIGN KEY` on stream tables via event trigger (same pattern as EC-25/EC-26). **P3** — the current documentation is sufficient, and FK creation on stream tables is rare. --- ### EC-28 — PgBouncer transaction pooling | Field | Value | |-------|-------| | **Area** | Infrastructure — connection pooling | | **Tier** | **P2** | | **Impact** | Prepared statement errors, lock escapes, GUC not applied | | **Current mitigation** | Use session-mode pooling; use `SET LOCAL` | | **Documented in** | FAQ § "PgBouncer" | | **Status** | ✅ **Done** — documentation already exists | **Proposed fix:** pg_trickle's scheduler uses a single dedicated backend connection (not pooled), so the scheduler itself is unaffected. The edge case applies to application queries that use `SET pg_trickle.*` GUCs via a pooled connection. **No code fix** — this is a PgBouncer limitation. Improve documentation with a recommended PgBouncer configuration snippet. --- ### EC-29 — TopK + set operations / GROUPING SETS | Field | Value | |-------|-------| | **Area** | TopK — SQL restrictions | | **Tier** | **P3** | | **Impact** | Creation error | | **Documented in** | SQL_REFERENCE § "TopK" | **Proposed fix:** TopK uses scoped recomputation via MERGE, which fundamentally requires a single `ORDER BY … LIMIT` at the top level. Set operations and multi-level grouping sets break this invariant. **No code fix.** Keep the clear rejection error and document the workaround (split into separate stream tables). --- ### EC-30 — Circular dependencies | Field | Value | |-------|-------| | **Area** | DAG — cycle detection | | **Tier** | **P3** | | **Impact** | Creation error with cycle path in message | | **Documented in** | FAQ § "Dependencies"; SQL_REFERENCE | **Proposed fix:** Already handled — rejected at creation time with a clear error listing the cycle path. See `plans/sql/PLAN_CIRCULAR_REFERENCES.md` for future exploration of safe fixed-point evaluation. **No further work.** --- ### EC-31 — FOR UPDATE / FOR SHARE in defining query | Field | Value | |-------|-------| | **Area** | SQL — row locking | | **Tier** | **P3** | | **Impact** | Creation error | | **Documented in** | SQL_REFERENCE § "Rejected Constructs" | **Proposed fix:** By design — row locks serve no purpose in stream table defining queries. **No code fix.** The error message is clear. --- ### EC-32 — ALL (subquery) not supported | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — subquery operators | | **Tier** | **P2** | | **Impact** | Creation error | | **Documented in** | FAQ § "Why Are These SQL Features Not Supported?" | | **Status** | ✅ **IMPLEMENTED** — `parse_all_sublink()` in `src/dvm/parser.rs` updated with NULL-safe anti-join condition `(col IS NULL OR NOT (x op col))`; full AntiJoin pipeline was already wired (`parse_sublink_to_wrapper` → `parse_all_sublink` → `AntiJoin SublinkWrapper` → `diff_anti_join`) | **Fix applied:** NULL-safe anti-join condition. `ALL (subquery)` is mathematically a conjunction of per-row comparisons. The full pipeline through `parse_all_sublink()` → `AntiJoin` OpTree → `diff_anti_join` was already wired. The outstanding issue was NULL-safety: the previous condition `NOT (x op col)` evaluated to NULL (not TRUE) when `col IS NULL`, so NULL rows in the subquery were silently ignored, giving wrong results. Fixed by changing the condition to `(col IS NULL OR NOT (x op col))`, which correctly excludes the outer row whenever any subquery row is NULL or fails the comparison. **Test coverage:** 8 E2E tests (`e2e_all_subquery_tests.rs`): - `test_all_subquery_less_than_differential` — basic `price < ALL` filter matching the SQL Reference worked example. - `test_all_subquery_differential_inner_insert` — differential refresh after inserting a cheaper competitor price that disqualifies a product. - `test_all_subquery_differential_outer_change` — outer INSERT + inner DELETE with recomputation. - `test_all_subquery_null_in_inner` — NULL-safety: ALL returns false when subquery contains NULL; removing NULL re-qualifies rows. - `test_all_subquery_empty_inner` — empty subquery: ALL against empty set yields true (SQL standard); inserting reduces result. - `test_all_subquery_full_refresh` — `>= ALL` with FULL refresh mode. - `test_all_subquery_equals_operator` — `= ALL` (value equals every row). - `test_all_subquery_not_equals_operator` — `<> ALL` (equivalent to NOT IN). --- ### EC-33 — Statistical aggregates (CORR, COVAR_*, REGR_*) | Field | Value | |-------|-------| | **Area** | DIFFERENTIAL — aggregates | | **Tier** | **P3** | | **Impact** | Creation error in DIFFERENTIAL mode | | **Current mitigation** | Use FULL mode; or pre-aggregate in DIFFERENTIAL and compute in a view | | **Documented in** | FAQ § "Unsupported Aggregates" | **Chosen fix: group-rescan (see PLAN_TRANSACTIONAL_IVM_PART_2.md Task 4.1):** Implement `CORR`/`COVAR_POP`/`COVAR_SAMP`/`REGR_*` using the proven group-rescan strategy already used for `BOOL_AND`, `STRING_AGG`, etc.: 1. Detect affected groups from the delta (groups where any row was inserted, updated, or deleted). 2. For each affected group, re-aggregate from source: `SELECT group_key, CORR(y, x) FROM source GROUP BY group_key` 3. Apply the result as an UPDATE to the stream table. This requires no stream table schema changes and no new OpTree variants. **Deferred — Welford auxiliary accumulator columns:** The earlier proposal of maintaining `sum_x`, `sum_y`, `sum_xy`, `sum_x2`, `count` accumulator columns in the stream table using Welford's online algorithm adapted to SQL EXCEPT/UNION was explored but is deferred. Rationale: it requires a breaking schema change to existing stream tables, the SQL delta-algebra correctness of the Welford adaptation is non-trivial to verify, and group-rescan is already correct and benchmarked for analogue aggregates. Revisit for v2.0 if group-rescan performance is insufficient for very large groups. --- ### EC-34 — Backup / restore loses WAL replication slots | Field | Value | |-------|-------| | **Area** | Operations — disaster recovery | | **Tier** | **P1** | | **Impact** | CDC stuck; slot-missing error after restore | | **Current mitigation** | Set `cdc_mode = 'trigger'` after restore; let auto recreate slots | | **Documented in** | FAQ § "Operations" | | **Status** | ✅ **IMPLEMENTED** — auto-detect missing slot in health check; fall back to TRIGGER | **Proposed fix:** 1. **Short term (P1):** Add a `health_check()` finding that detects "cdc_mode includes WAL but replication slot does not exist" and recommends the trigger fallback. 2. **Medium term:** On scheduler startup, when a source's `cdc_phase` is `WAL` but the slot is missing, automatically fall back to TRIGGER mode with a WARNING log and a NOTIFY. **Test coverage:** 2 E2E tests (`e2e_wal_cdc_tests.rs`): - `test_wal_fallback_on_missing_slot` (pre-existing) — full lifecycle test: creates WAL-mode ST, drops replication slot externally, waits for automatic fallback to TRIGGER, verifies data integrity after fallback. - `test_ec34_check_cdc_health_detects_missing_slot` — verifies `check_cdc_health()` surfaces `replication_slot_missing` alert immediately after slot drop, then confirms fallback to TRIGGER and post-fallback data integrity. --- ### EC-35 — Managed cloud PostgreSQL restrictions | Field | Value | |-------|-------| | **Area** | Infrastructure — cloud compatibility | | **Tier** | **P3** | | **Impact** | Extension cannot be installed | | **Documented in** | FAQ § "Deployment" | **Proposed fix:** pg_trickle cannot control cloud provider policies. Keep the FAQ entry updated with a compatibility matrix (RDS, Cloud SQL, AlloyDB, Azure Flexible, Supabase, Neon). Note that trigger-based CDC avoids the most common restriction (`wal_level = logical`). --- ### EC-36 — Managed cloud: shared_preload_libraries restriction | Field | Value | |-------|-------| | **Area** | Infrastructure — cloud compatibility | | **Tier** | **P3** | | **Impact** | Background worker + shared memory unavailable | | **Documented in** | FAQ § "Deployment" | **Proposed fix:** Investigate a "no-bgworker" mode where `refresh_stream_table()` is called by an external cron (e.g. `pg_cron` or an application timer). The scheduler would be optional when `shared_preload_libraries` is unavailable. See `plans/infra/REPORT_EXTERNAL_PROCESS.md` for the sidecar feasibility study. **Long term (v2.0+).** --- ## Prioritised Recommendations ### P0 — Must fix (data correctness) | # | Edge Case | Action | Estimated Effort | |---|-----------|--------|------------------| | 1 | ✅ EC-01: JOIN key change + right-side DELETE | R₀ via EXCEPT ALL — implemented in `diff_inner/left/full_join` | Done | | 2 | ✅ EC-06: Keyless table duplicate rows | WARNING at creation + `has_keyless_source` flag | Done | | 3 | EC-19: WAL + keyless without REPLICA IDENTITY FULL | Reject at creation time with clear error | 0.5 day | ### P1 — Should fix (operational surprise) | # | Edge Case | Action | Estimated Effort | |---|-----------|--------|------------------| | 4 | ✅ EC-11: Schedule < refresh duration | `scheduler_falling_behind` NOTIFY alert at 80% threshold | Done | | 5 | ✅ EC-13: Diamond split-version reads | Default `'atomic'` + `pg_sys::BeginInternalSubTransaction` in scheduler | Done | | 6 | ✅ EC-15: SELECT * + DDL breakage | WARNING at creation for `SELECT *` queries | Done | | 7 | EC-16: ALTER FUNCTION undetected | `pg_proc` hash polling on refresh cycle | 2 days | | 8 | EC-18: `auto` CDC stuck in TRIGGER | Rate-limited LOG explaining why; health_check finding | 1 day | | 9 | ✅ EC-25: TRUNCATE on stream table | Event trigger blocks TRUNCATE on pgtrickle schema | Done | | 10 | ✅ EC-26: Direct DML on stream table | Guard trigger detecting non-engine writes | Done | | 11 | EC-34: WAL slots lost after restore | Auto-detect missing slot; fall back to TRIGGER + WARNING | 1 day | ### P2 — Nice to have (usability gaps) | # | Edge Case | Action | Estimated Effort | |---|-----------|--------|------------------| | 12 | EC-03: Window functions in expressions | CTE extraction in parser | 3–5 days | | 13 | EC-05: Foreign tables | ✅ IMPLEMENTED — error msg + polling CDC | — | | 14 | ✅ EC-09: IMMEDIATE unsupported constructs | Done — recursive CTEs + TopK now supported | Done | | 15 | ✅ EC-17: DDL during active refresh | FAQ documentation added | Done | | 16 | ✅ EC-20: CDC TRANSITIONING complexity | `check_cdc_transition_health()` in scheduler.rs | Done | | 17 | ✅ EC-28: PgBouncer | Documentation already exists | Done | | 18 | EC-32: ALL (subquery) | Parser rewrite to NOT EXISTS (… EXCEPT …) | 2–3 days | ### P3 — Accepted trade-offs (document, no code change) ✅ EC-02 (GUC implemented), EC-04, EC-07, EC-08, EC-10, EC-12, EC-14, ✅ EC-21/22/23 (docs done), EC-24, EC-27, EC-29, EC-30, EC-31, EC-35, EC-36. These are either fundamental design trade-offs or have adequate existing mitigations. Keep documentation current and revisit if user demand surfaces. ### P3 — Committed long-term implementation | # | Edge Case | Action | Estimated Effort | |---|-----------|--------|------------------| | 19 | EC-33: CORR, COVAR_*, REGR_* | Auxiliary accumulator columns; Welford-based SQL delta templates | 5–8 days | --- ## Implementation Order **Recommended sprint sequence** (assuming 2-week sprints): **Sprint 1 — P0 corrections:** ✅ DONE - ✅ EC-19: WAL + keyless rejection (done: `has_keyless_source` flag + WARNING) - ✅ EC-06: Keyless duplicate warning + `has_keyless_source` catalog column - ✅ EC-01: R₀ via EXCEPT ALL (implemented in `diff_inner/left/full_join`) **Sprint 2 — P1 operational safety:** ✅ DONE - ✅ EC-25: Block TRUNCATE on stream tables (event trigger) - ✅ EC-26: Guard trigger for direct DML - ✅ EC-15: Warn on `SELECT *` (WARNING at creation) - ✅ EC-11: `scheduler_falling_behind` alert (NOTIFY at 80% threshold) - ✅ EC-13: Default `diamond_consistency = 'atomic'` + scheduler SAVEPOINT → `pg_sys::BeginInternalSubTransaction` fix **Sprint 3 — Remaining P1 + P2 starts:** ✅ DONE (v0.6.0) - ✅ EC-18: Rate-limited LOG for stuck auto mode (1 day) - ✅ EC-34: Auto-detect missing WAL slot (1 day) - ✅ EC-16: `pg_proc` hash polling for function changes (2 days) - ✅ EC-03: Window function CTE extraction (3–5 days) **Sprint 4+ — P2 + P3 long-term:** ✅ DONE (v0.6.0) - ✅ EC-05, EC-32: Foreign table change detection; `ALL (subquery)` → `NOT EXISTS (… EXCEPT …)` rewrite - EC-33: Statistical aggregates via auxiliary accumulator columns - Documentation sweep for all remaining P3 items --- ## Open Questions The following questions are unresolved design or validation unknowns that must be answered before the associated fixes can be fully specified or committed to. --- ### OQ-01 — EC-01: What is the correct new default for `adaptive_full_threshold`? ✅ RESOLVED **Resolution (2026-05-30):** The short-term threshold adjustment is dropped as not viable. It is a heuristic that can still silently miss deletes when co-occurring changes fall below the threshold, providing false safety. The primary fix is the R₀ via EXCEPT ALL approach (see EC-01). The `adaptive_full_threshold` GUC remains as-is for users who want it as an escape valve, but it is no longer positioned as an EC-01 mitigation path. --- ### OQ-02 — EC-01: Is pre-image capture viable without a breaking change buffer schema change? ✅ RESOLVED **Resolution (2026-05-30):** Schema changes are NOT needed. The CDC trigger (`src/cdc.rs`) already stores `OLD."col"` → `"old_col"` typed columns in the change buffer for every UPDATE and DELETE event. The old join key values are therefore already available. The question is now about *when* to use them. The R₀ via EXCEPT ALL approach (chosen for Sprint 1) does not need `old_col` column access from the delta CTE at all — it reconstructs the pre-change right state from the live table + ΔR change set. The pre-image (`old_col`) approach is still viable as a future optimisation (avoids the live-table EXCEPT ALL scan) but is deferred to v1.x because it requires structural changes to `DiffContext`/`DeltaSource` to surface `old_col` values in the delta CTE output. Effort: 7–10 days when the time comes. --- ### OQ-03 — EC-01: What is the overhead of the R₀ via EXCEPT ALL approach? **Updated analysis (2026-05-30):** Code archaeology shows the overhead is similar to what Part 2 already pays for `L₀`: - Part 1 already does a semi-join-filtered scan of `R₁` (full right table, filtered by delta-left join keys). Adding R₀ = `R₁ EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes` adds **one extra pass over the ΔR CTE** (which is small — only changed rows). The live table is not scanned an extra time because `R₁` is already in the query plan; the EXCEPT ALL just subtracts a small set from it. - The semi-join filter that limits R₁ also applies to R₀, so the set difference operates on already-filtered rows. - The correction term (Part 3) is unaffected by this change. **Expected overhead:** < 10% additional latency on the delta query for typical workloads where ΔR is small relative to R. The concern about "one extra temp-table scan per join source" from the original write-up was over-estimated — the EXCEPT ALL operates on the already-materialized ΔR CTE, not the full right table. **Recommendation:** Confirm with a micro-benchmark (1M right table, 0.1% change rate) before the Sprint 1 PR merges. Accept if overhead is under 30% of current DIFFERENTIAL latency; otherwise fall back to applying R₀ only when `ΔR` is non-empty (easy conditional). --- ### OQ-04 — EC-03: Where exactly do window-function-in-expression rewrites fail? The CTE extraction rewrite for EC-03 is described as covering "~80% of user demand", with complex cases (nested window calls inside arithmetic) remaining unsupported. **Open questions:** - Is `SUM(ROW_NUMBER() OVER (…))` rejectable at parse time, or does it require a full semantic analysis? - Can `NTILE(4) OVER (…) * 100` be rewritten to a CTE without changing query semantics? - What are the most common real-world patterns users hit? (Survey the GitHub issues / Discord.) **Recommendation:** Before implementing the rewrite, define a formal boundary: "any expression where a window function appears as a direct child of the SELECT list is rewriteable; anything deeper is rejected with a clear error." Enumerate the cases explicitly in a test file (`tests/e2e_window_expression_tests.rs`) before writing the rewrite code. This keeps the scope bounded. --- ### OQ-05 — EC-06: Is `ctid`-based row identity viable for trigger-mode CDC? The long-term fix for EC-06 proposes using `ctid` as a stable row identifier for keyless tables in trigger mode. `ctid` is the physical heap location and changes after `VACUUM FULL` / `CLUSTER`. **Open questions:** - Does pgrx expose the `ctid` of `OLD` in a row-level trigger? (Need to verify via `SPI_gettypeid` or `HeapTupleHeader`.) - What happens if `VACUUM FULL` runs between the CDC trigger fire and the next refresh cycle? Would the stale `ctid` silently match the wrong row? - Is `ctid` reset to the same value when the same physical slot is reused? (Yes in PostgreSQL — `ctid` is reused after `VACUUM`, not unique across time.) **Recommendation:** Do **not** pursue `ctid`-based identity. The reuse semantics make it unsafe as a durable row identifier. Instead, focus Sprint 1 effort on the count-based hash approach (OQ-05 is answered: ctid is off the table). If a unique row identity is required, recommend users add a `DEFAULT gen_random_uuid()` column. --- ### OQ-06 — EC-08: Do statement-level transition tables actually batch multiple row changes? The plan for EC-08 states that statement-level triggers with transition tables "provide batching naturally". This is true for a single `UPDATE` that affects N rows — the transition table contains all N old/new rows and the IVM trigger fires once. But it is **not** true for N separate single-row `UPDATE` statements in a loop — each fires the trigger once. **Open questions:** - What is the actual p50/p99 latency overhead per DML statement in IMMEDIATE mode on a stream table with a 3-way join? (No benchmark exists.) - Does `pg_advisory_xact_lock` inside the trigger add measurable contention under 100+ concurrent single-row updates? **Recommendation:** Add `benches/ivm_write_overhead.rs` as a Criterion benchmark that measures per-statement latency for IMMEDIATE vs DIFFERENTIAL mode at 1, 10, 100 concurrent writers. Run it as part of the weekly benchmark schedule. Make the results visible in `docs/BENCHMARK.md` so users can make informed mode choices. This answers the question empirically rather than by assumption. --- ### OQ-07 — EC-36: How much of the scheduler can be decoupled for cloud deployments? The long-term fix for EC-36 proposes a "no-bgworker" mode where `refresh_stream_table()` is driven by `pg_cron` or an application timer. `plans/infra/REPORT_EXTERNAL_PROCESS.md` covers the sidecar feasibility research but reaches no conclusion. **Open questions:** - The scheduler currently holds a `pg_advisory_lock` during refresh to prevent concurrent refreshes. Would a cron-driven caller need to acquire this lock explicitly? (Yes — this is straightforward.) - The scheduler tracks refresh history in `pgt_refresh_history`. Would a cron-driven caller need to insert rows there manually, or can this be encapsulated in `refresh_stream_table()`? (Likely yes — needs a small `refresh.rs` refactor.) - What happens to the background worker registration if `pg_trickle` is in `shared_preload_libraries` but `pg_trickle.enabled = off`? The worker starts but immediately sleeps — is this acceptable for cloud providers that bill per background worker slot? **Recommendation:** The prerequisite for EC-36 is verifying that `refresh_stream_table()` already records to `pgt_refresh_history` (check `src/refresh.rs`). If it does, the cron-mode is already 80% implemented — users just need to disable the bgworker and call `refresh_stream_table()` on a schedule. Document this pattern in `docs/FAQ.md` under "Deployment on managed PostgreSQL" as an interim workaround before a formal no-bgworker mode is built.