# PLAN: SQL Gaps — Phase 6 **Status:** Reference **Date:** 2026-02-25 **Branch:** `main` **Scope:** Comprehensive gap analysis — PostgreSQL 18 SQL coverage vs pg_trickle implementation. **Current state:** 896 unit tests, 22 E2E test suites (350 E2E tests), 61 integration tests, 36 AggFunc variants, 21 OpTree variants, 21 diff operators, 5 auto-rewrite passes, 17 GUC variables. --- ## Table of Contents 1. [Executive Summary](#1-executive-summary) 2. [Methodology](#2-methodology) 3. [Ground Truth: Current Implementation State](#3-ground-truth-current-implementation-state) 4. [Gap Category 1: Expression Node Types](#4-gap-category-1-expression-node-types) 5. [Gap Category 2: Source Object Types](#5-gap-category-2-source-object-types) 6. [Gap Category 3: Aggregate Functions](#6-gap-category-3-aggregate-functions) 7. [Gap Category 4: Type System & Data Types](#7-gap-category-4-type-system--data-types) 8. [Gap Category 5: PostgreSQL 16/17/18 New Features](#8-gap-category-5-postgresql-161718-new-features) 9. [Gap Category 6: CDC & Change Tracking Edge Cases](#9-gap-category-6-cdc--change-tracking-edge-cases) 10. [Gap Category 7: Correctness Blind Spots](#10-gap-category-7-correctness-blind-spots) 11. [Gap Category 8: Operational & Concurrency Gaps](#11-gap-category-8-operational--concurrency-gaps) 12. [Gap Category 9: Documentation Drift](#12-gap-category-9-documentation-drift) 13. [Gap Category 10: Intentional Rejections (Not Gaps)](#13-gap-category-10-intentional-rejections-not-gaps) 14. [Prioritized Implementation Roadmap](#14-prioritized-implementation-roadmap) 15. [Historical Progress Summary](#15-historical-progress-summary) --- ## 1. Executive Summary pg_trickle now covers the vast majority of PostgreSQL SELECT syntax. Phases 1–5 resolved 52+ original gaps, implemented 5 transparent auto-rewrite passes (DISTINCT ON, GROUPING SETS/CUBE/ROLLUP, scalar subquery in WHERE, SubLinks in OR, multi-PARTITION BY windows), added 21 diff operators, and expanded aggregate support from 5 to 36 functions. **Zero P0 (silent data corruption) or P1 (incorrect semantics) issues remain in the core SQL parsing and delta computation.** All unsupported constructs are rejected with clear, actionable error messages. However, deep analysis reveals a new class of gaps — not in SQL syntax coverage but in the **operational boundary conditions** where pg_trickle intersects with PostgreSQL's broader feature set: | Category | P0 | P1 | P2 | P3 | P4 | Total | |----------|----|----|----|----|----|----| | Expression node types | 0 | 0 | 3 | 0 | 2 | 5 | | Source object types | 1 | 1 | 1 | 0 | 0 | 3 | | Aggregate functions | 0 | 1 | 2 | 0 | 1 | 4 | | Type system & data types | 0 | 1 | 0 | 0 | 3 | 4 | | PostgreSQL 16/17/18 features | 0 | 0 | 3 | 0 | 2 | 5 | | CDC & change tracking | 1 | 1 | 1 | 0 | 1 | 4 | | Correctness blind spots | 1 | 1 | 1 | 0 | 0 | 3 | | Operational & concurrency | 0 | 0 | 2 | 0 | 2 | 4 | | Documentation drift | 0 | 0 | 0 | 4 | 2 | 6 | | **Total new gaps** | **3** | **5** | **13** | **4** | **13** | **38** | The 3 new P0 items are: 1. **Views as sources in DIFFERENTIAL mode** — CDC triggers cannot be placed on views; changes are silently missed 2. **Volatile expressions inside `Expr::Raw`** — volatility checker cannot see functions wrapped in CASE/COALESCE/BETWEEN/etc. 3. **Partitioned tables as CDC sources** — row-level triggers on parent may miss changes routed to child partitions --- ## 2. Methodology This analysis was performed by: 1. **Code audit** of all 11,024 lines in `src/dvm/parser.rs` — every `node_to_expr()` match arm, rejection function, and auto-rewrite pass 2. **Operator inventory** of all 21 diff operator modules in `src/dvm/operators/` 3. **Cross-reference** of PostgreSQL 18 parse tree node types (`pg_sys::NodeTag`) against handled types 4. **CDC system review** of `src/cdc.rs` trigger setup, `src/hooks.rs` DDL event handling, `src/refresh.rs` refresh engine 5. **Documentation comparison** across `SQL_REFERENCE.md`, `DVM_OPERATORS.md`, `ARCHITECTURE.md`, `CONFIGURATION.md` vs actual code 6. **Test coverage analysis** of 896 unit tests, 350 E2E tests, and 61 integration tests 7. **Review of previous gap analyses** (SQL_GAPS_1 through SQL_GAPS_5, GAP_SQL_OVERVIEW.md) --- ## 3. Ground Truth: Current Implementation State ### 3.1 Supported Features (DIFFERENTIAL mode) | Category | Features | Count | |----------|----------|-------| | **OpTree variants** | Scan, Project, Filter, InnerJoin, LeftJoin, FullJoin, Aggregate, Distinct, UnionAll, Intersect, Except, Subquery, CteScan, RecursiveCte, RecursiveSelfRef, Window, LateralFunction, LateralSubquery, SemiJoin, AntiJoin, ScalarSubquery | 21 | | **Diff operators** | scan, filter, project, join, outer_join, full_join, aggregate, distinct, union_all, intersect, except, subquery, cte_scan, recursive_cte, window, lateral_function, lateral_subquery, semi_join, anti_join, scalar_subquery, join_common | 21 | | **Aggregate functions** | COUNT/COUNT(*), SUM, AVG, MIN, MAX, BOOL_AND/EVERY, BOOL_OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, BIT_AND, BIT_OR, BIT_XOR, STDDEV_POP/STDDEV, STDDEV_SAMP, VAR_POP, VAR_SAMP/VARIANCE, MODE, PERCENTILE_CONT, PERCENTILE_DISC, CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX/AVGY/COUNT/INTERCEPT/R2/SLOPE/SXX/SXY/SYY | 36 | | **Expression types** | ColumnRef, A_Const, BinaryOp (AEXPR_OP + unary), BoolExpr (AND/OR/NOT), FuncCall, TypeCast, NullTest, CaseExpr, CoalesceExpr, NullIfExpr, MinMaxExpr, SQLValueFunction (15 variants), BooleanTest (6 variants), SubLink (EXISTS/ANY/EXPR), ArrayExpr, RowExpr, A_Indirection, AEXPR_IN, AEXPR_BETWEEN (4 variants), AEXPR_DISTINCT/NOT_DISTINCT, AEXPR_SIMILAR, AEXPR_OP_ANY/ALL | 30+ | | **Auto-rewrites** | DISTINCT ON → ROW_NUMBER(), GROUPING SETS/CUBE/ROLLUP → UNION ALL, Scalar subquery in WHERE → CROSS JOIN, SubLinks in OR → UNION, Multi-PARTITION BY windows → nested subqueries | 5 | | **Join types** | INNER, LEFT, RIGHT (→LEFT swap), FULL, CROSS, NATURAL (catalog-resolved), LATERAL (INNER/LEFT) | 7 | | **Set operations** | UNION ALL, UNION (dedup), INTERSECT [ALL], EXCEPT [ALL], mixed UNION/UNION ALL | 5 | | **CTEs** | Non-recursive (single + multi-ref shared delta), WITH RECURSIVE (semi-naive/DRed/recomputation) | 2 | | **Window** | PARTITION BY, ORDER BY, frame clauses (ROWS/RANGE/GROUPS + EXCLUDE), named WINDOW | Full | | **Subqueries** | FROM subquery, EXISTS/NOT EXISTS in WHERE, IN/NOT IN (subquery), ALL (subquery), scalar in SELECT, scalar in WHERE (rewritten) | Full | ### 3.2 Auto-Rewrite Pipeline (Pre-Parse) These functions transparently rewrite the raw parse tree before the main DVM parser runs: | Order | Function | Trigger | Rewrite | |-------|----------|---------|---------| | 1 | `rewrite_distinct_on()` | `DISTINCT ON(expr)` in SELECT | → `ROW_NUMBER() OVER (PARTITION BY expr ORDER BY ...) = 1` subquery | | 2 | `rewrite_grouping_sets()` | `GROUPING SETS`/`CUBE`/`ROLLUP` in GROUP BY | → `UNION ALL` of separate `GROUP BY` queries with `GROUPING()` literals | | 3 | `rewrite_scalar_subquery_in_where()` | Scalar `SubLink` in WHERE | → `CROSS JOIN` with scalar subquery as inline view | | 4 | `rewrite_sublinks_in_or()` | `EXISTS`/`IN` SubLinks inside `OR` | → `UNION` of separate filtered queries | | 5 | `rewrite_multi_partition_windows()` | Multiple different `PARTITION BY` clauses | → Nested subqueries, one per distinct partitioning | ### 3.3 Infrastructure | Component | Status | |-----------|--------| | **CDC triggers** | Row-level AFTER INSERT/UPDATE/DELETE + statement-level AFTER TRUNCATE | | **WAL transition** | Hybrid trigger→WAL migration with configurable timeout | | **DDL tracking** | Event trigger on `ddl_command_end`; classifies changes as Benign/ConstraintChange/ColumnChange | | **Schema snapshots** | `columns_used` per dependency; `schema_fingerprint` for fast equality | | **Block source DDL** | `pg_trickle.block_source_ddl` GUC (default: false) | | **Keyless tables** | All-column content hash for `__pgt_row_id` | | **Volatile detection** | `pg_proc.provolatile` lookup; rejects volatile in DIFF, warns for stable | | **MERGE-based refresh** | Cached MERGE SQL templates, prepared statements, planner hints | | **GUC count** | 17 settings in `pg_trickle.*` namespace | --- ## 4. Gap Category 1: Expression Node Types These are PostgreSQL raw parse tree node types not handled in `node_to_expr()`. The catch-all returns `PgTrickleError::UnsupportedOperator` with the node tag name, so none are silent — they all surface as clear errors. ### G1.1 — `T_CollateExpr` (COLLATE clause on expressions) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `expr COLLATE "C"`, `column COLLATE "en_US"` | | **Parse tree** | `T_CollateExpr { arg, collname }` | | **Current behavior** | Rejected: `"Expression type T_CollateExpr is not supported"` | | **Severity** | **P2** — rejected with error | | **Impact** | Medium — affects internationalized text queries, case-insensitive comparisons | | **Effort** | 1–2 hours | | **Plan** | Deparse as `(expr) COLLATE "collation_name"` → `Expr::Raw` | ```sql -- Currently rejected: SELECT name COLLATE "C" AS sorted_name FROM users ORDER BY name COLLATE "C" SELECT * FROM t WHERE name COLLATE "en_US" = 'café' ``` **Implementation:** Extract `collname` from the `CollateExpr` node, recursively deparse the `arg` expression, emit `Expr::Raw("(arg_sql) COLLATE \"collname\"")`. --- ### G1.2 — `T_XmlExpr` (XML processing expressions) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `XMLELEMENT`, `XMLFOREST`, `XMLPARSE`, `XMLPI`, `XMLROOT`, `XMLSERIALIZE`, `XMLCONCAT` | | **Current behavior** | Rejected: `"Expression type T_XmlExpr is not supported"` | | **Severity** | **P4** — very niche | | **Impact** | Very low — XML processing in PostgreSQL is rare | | **Effort** | 4–6 hours (many sub-variants) | | **Recommendation** | Defer indefinitely — XML features are extremely niche | --- ### G1.3 — `T_GroupingFunc` (GROUPING() outside auto-rewrite) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `GROUPING(col1, col2)` in SELECT list | | **Current behavior** | Rejected when encountered outside the GROUPING SETS rewrite path | | **Severity** | **P4** — edge case only | | **Impact** | Very low — `GROUPING()` is only meaningful with GROUPING SETS, which is auto-rewritten | | **Effort** | 1 hour | | **Recommendation** | Defer — the auto-rewrite handles the standard case. Only trips if someone uses `GROUPING()` without GROUPING SETS, which is meaningless in PostgreSQL too. | --- ### G1.4 — `T_JsonIsPredicate` (PostgreSQL 16+) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `expr IS JSON`, `expr IS JSON SCALAR`, `expr IS JSON ARRAY`, `expr IS JSON OBJECT`, `expr IS NOT JSON` | | **Current behavior** | Rejected: `"Expression type T_JsonIsPredicate is not supported"` | | **Severity** | **P2** — growing in usage | | **Impact** | Medium — JSON validation is increasingly common | | **Effort** | 2–3 hours | | **Plan** | Deparse as `Expr::Raw("(expr) IS [NOT] JSON [SCALAR|ARRAY|OBJECT] [WITH|WITHOUT UNIQUE KEYS]")` | ```sql -- Currently rejected: SELECT * FROM api_logs WHERE response IS JSON SELECT * FROM events WHERE payload IS JSON OBJECT ``` --- ### G1.5 — `T_JsonConstructorExpr` (PostgreSQL 16+ SQL/JSON constructors) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `JSON()`, `JSON_ARRAY()`, `JSON_OBJECT()`, `JSON_ARRAYAGG()`, `JSON_OBJECTAGG()`, `JSON_SCALAR()`, `JSON_SERIALIZE()` | | **Current behavior** | Rejected as unrecognized expression type | | **Severity** | **P2** — SQL standard JSON constructors growing in adoption | | **Impact** | Medium — these are the SQL-standard way to construct JSON | | **Effort** | 4–6 hours (several sub-variants) | | **Plan** | Deparse each variant to its SQL representation via `Expr::Raw` | ```sql -- Currently rejected: SELECT JSON_OBJECT('name': name, 'age': age) FROM users SELECT JSON_ARRAY(1, 2, 3) SELECT JSON_ARRAYAGG(name ORDER BY name) FROM users GROUP BY dept ``` **Note:** `JSON_ARRAYAGG` and `JSON_OBJECTAGG` are aggregate functions that PostgreSQL 16+ implements via `T_JsonConstructorExpr` rather than `T_FuncCall`. They bypass the regular aggregate recognition path entirely. --- ## 5. Gap Category 2: Source Object Types ### G2.1 — Views as Sources in DIFFERENTIAL Mode ✅ RESOLVED > **Resolved by:** View inlining auto-rewrite (PLAN_VIEW_INLINING.md). Views are > transparently replaced with inline subqueries so CDC triggers land on base > tables. Nested views fully expanded. Original query preserved for reinit. | Field | Value | |-------|-------| | **Gap** | CDC triggers cannot be placed on views — changes to underlying tables are silently missed | | **Current behavior** | View is resolved via `pg_class` OID lookup; stream table creation succeeds. But `create_change_trigger()` creates row-level triggers on the view's OID, which PostgreSQL silently ignores (row-level triggers on views require INSTEAD OF triggers). DIFFERENTIAL mode produces no deltas — the stream table becomes permanently stale after initial population. | | **Severity** | **P0 — Silent data staleness** | | **Impact** | High — views are commonly used as abstraction layers | | **Effort** | 4–6 hours | | **Detection** | None — no error, no warning. The stream table appears healthy but never updates. | **Implementation options:** | Option | Approach | Effort | Tradeoffs | |--------|----------|--------|-----------| | **A. Reject views in DIFF** | Check `relkind` in `resolve_table_oid()` — reject 'v'/'m' in FROM when mode is DIFFERENTIAL | 2 hours | Safe, clear error; limits flexibility | | **B. Recursive source resolution** | Walk `pg_depend` + `pg_rewrite` to find base tables underlying the view, attach CDC triggers to those instead | 8–12 hours | Complex; multi-level view stacks; fragile to view redefinition | | **C. Warn + force FULL** | Detect view sources and auto-downgrade to FULL mode with a WARNING | 3 hours | Pragmatic; no silent staleness; performance cost for views | **Recommendation:** Option A (reject) for DIFFERENTIAL, with a clear error message suggesting FULL mode. Option C as a follow-up enhancement. ```sql -- A view used as a source: CREATE VIEW active_orders AS SELECT * FROM orders WHERE status = 'active'; -- This should either reject or warn: SELECT pgtrickle.create('order_summary', 'DIFFERENTIAL', 'SELECT customer_id, COUNT(*) FROM active_orders GROUP BY customer_id'); ``` --- ### G2.2 — Materialized Views as Sources ✅ RESOLVED > **Resolved by:** View inlining auto-rewrite (PLAN_VIEW_INLINING.md). > Materialized views are rejected in DIFFERENTIAL mode with a clear error. > Allowed in FULL mode. | Field | Value | |-------|-------| | **Gap** | Same as G2.1 — CDC triggers don't fire on `REFRESH MATERIALIZED VIEW` | | **Current behavior** | Materialized view resolved as a regular source; CDC triggers attached but never fire on REFRESH | | **Severity** | **P1 — Incorrect semantics** | | **Impact** | Low-Medium — materialized views as sources are less common | | **Effort** | Included in G2.1 fix (relkind 'm' check) | --- ### G2.3 — Foreign Tables as Sources | Field | Value | |-------|-------| | **Gap** | Row-level triggers cannot be created on foreign tables (PostgreSQL restriction) | | **Current behavior** | `create_change_trigger()` will fail with a PostgreSQL error, which surfaces to the user. This is not silent but the error message is confusing (a PG internal error about trigger creation). | | **Severity** | **P2 — Confusing error** | | **Impact** | Low — foreign tables as stream table sources are uncommon | | **Effort** | 1 hour — detect `relkind = 'f'` and reject with clear message | --- ## 6. Gap Category 3: Aggregate Functions ### G3.1 — User-Defined Aggregates | Field | Value | |-------|-------| | **Gap** | Custom aggregates created with `CREATE AGGREGATE` are not recognized | | **Current behavior** | Treated as regular function calls → placed into `Expr::FuncCall` instead of `AggExpr`. The defining query is accepted but the aggregate semantics are lost — the function sees only the current row instead of the group. | | **Severity** | **P1 — Incorrect semantics** | | **Impact** | Medium — popular extensions define custom aggregates (e.g., `tdigest`, `hll_union_agg`, `topn_agg`) | | **Effort** | 4–6 hours | | **Detection** | None — the query parses successfully. Results are wrong. | **Root cause:** `extract_aggregates()` checks function names against a hardcoded `is_known_aggregate()` list. Functions not in this list are assumed to be scalar functions, even when they appear in a GROUP BY context. **Implementation:** 1. Query `pg_proc` for `prokind = 'a'` (aggregate) during aggregate extraction 2. If a function call in the SELECT list is an aggregate per `pg_proc` but not in the `AggFunc` enum: reject with clear error suggesting FULL mode 3. This prevents silent misclassification while preserving clear error messages ```sql -- Example with a custom aggregate from an extension: CREATE AGGREGATE my_median(numeric) ( SFUNC = my_median_sfunc, STYPE = internal, FINALFUNC = my_median_final ); -- pg_trickle silently treats this as a scalar function: SELECT dept, my_median(salary) FROM employees GROUP BY dept; -- → Wrong: my_median sees each row individually, not the group ``` --- ### G3.2 — `ANY_VALUE` (PostgreSQL 16+) | Field | Value | |-------|-------| | **Gap** | `ANY_VALUE(expr)` aggregate not recognized | | **Current behavior** | Treated as scalar function → wrong results in GROUP BY context | | **Severity** | **P2** — recognized as an aggregate by PostgreSQL 16+ | | **Impact** | Low-Medium — growing in usage as a replacement for non-deterministic GROUP BY columns | | **Effort** | 1 hour — add `AnyValue` variant using group-rescan pattern | ```sql -- PostgreSQL 16+: SELECT department, ANY_VALUE(manager_name) FROM employees GROUP BY department; ``` --- ### G3.3 — `RANGE_AGG` / `RANGE_INTERSECT_AGG` (PostgreSQL 14+) | Field | Value | |-------|-------| | **Gap** | Range aggregation functions not recognized | | **Current behavior** | Treated as scalar functions | | **Severity** | **P4** — very niche | | **Impact** | Very low | | **Effort** | 1 hour each — group-rescan pattern | --- ### G3.4 — `JSON_ARRAYAGG` / `JSON_OBJECTAGG` (SQL/JSON standard, PG 16+) | Field | Value | |-------|-------| | **Gap** | SQL/JSON standard aggregate functions use `T_JsonConstructorExpr` instead of `T_FuncCall` | | **Current behavior** | Not recognized as aggregates — parser never sees a `FuncCall` node for these | | **Severity** | **P2** — SQL-standard JSON aggregates growing in adoption | | **Impact** | Medium — blocks standard-compliant JSON aggregation | | **Effort** | 4–6 hours — requires `T_JsonConstructorExpr` handling in addition to aggregate recognition | **Note:** The existing `json_agg`/`jsonb_agg`/`json_object_agg`/`jsonb_object_agg` PostgreSQL-specific aggregates work fine. This gap only affects the SQL-standard syntax introduced in PostgreSQL 16. --- ## 7. Gap Category 4: Type System & Data Types ### G4.1 — Keyless Table `::text` Cast Lossyness | Field | Value | |-------|-------| | **Gap** | Content hash for keyless tables uses `col::text` which is lossy for some types | | **Current behavior** | Two distinct rows that produce identical `::text` representations get the same `__pgt_row_id` → one row silently overwrites the other in the MERGE | | **Severity** | **P1 — Silent data loss** (only for keyless tables) | | **Impact** | Low — requires both keyless tables AND problematic types | | **Affected types** | `float8` (rounding: `0.1 + 0.2 → '0.30000000000000004'` but intermediate representations may collide), `bytea` (encoding-dependent), `bit varying` (truncation), composite types (may not have unique text output) | | **Effort** | 4–6 hours | **Mitigations considered:** - Use `col::bytea` instead of `col::text` — more faithful but slower - Use `pg_column_size()` as a tie-breaker — adds cost but eliminates collisions - Store `ctid` as a fallback identifier — but `ctid` changes on UPDATE/VACUUM **Note:** Tables with primary keys are not affected — PK columns provide unique identity directly. --- ### G4.2 — Enum Type ALTER Inconsistency | Field | Value | |-------|-------| | **Gap** | `ALTER TYPE ... ADD VALUE` on an enum used in a source table is not tracked | | **Current behavior** | `detect_schema_change_kind()` classifies based on `pg_attribute` changes. Enum value additions don't change column attributes → classified as `Benign` → no reinit | | **Severity** | **P4** — rarely causes actual problems | | **Impact** | Very low — new enum values don't invalidate existing data or delta SQL | | **Effort** | 2 hours — add `pg_enum` change detection in `handle_alter_type()` | --- ### G4.3 — Composite Type Column Access in Delta SQL | Field | Value | |-------|-------| | **Gap** | `(record).field` access in delta SQL may produce wrong results when the composite type definition changes | | **Current behavior** | Field access is deparsed as `Expr::Raw` — the field name is baked into the delta template. If `ALTER TYPE` changes the composite, the delta SQL silently uses the old structure. | | **Severity** | **P4** — extremely rare scenario | | **Impact** | Very low | | **Effort** | 2 hours — add `ALTER TYPE` tracking for composite types | --- ### G4.4 — NULL Handling in Keyless Table Content Hash | Field | Value | |-------|-------| | **Gap** | Rows differing only by NULL placement can collide in the content hash | | **Current behavior** | `pg_trickle_hash_multi(ARRAY[col1::text, col2::text, ...])` converts all values to text. `NULL::text` is `NULL`, and `NULL` in an array may be handled inconsistently depending on the hash implementation. Two rows `(1, NULL, 3)` and `(1, 3, NULL)` could potentially hash to the same value if NULL handling in the array is position-insensitive. | | **Severity** | **P4** — theoretical concern; depends on `pg_trickle_hash_multi` implementation | | **Impact** | Very low — requires specific NULL patterns in keyless tables | | **Effort** | 2 hours — add position-aware NULL sentinel values (e.g., `'__null_N__'` where N is the ordinal) | --- ## 8. Gap Category 5: PostgreSQL 16/17/18 New Features ### G5.1 — `JSON_TABLE()` (PostgreSQL 17) | Field | Value | |-------|-------| | **PostgreSQL syntax** | `SELECT * FROM JSON_TABLE(data, '$.items[*]' COLUMNS (id INT PATH '$.id', name TEXT PATH '$.name'))` | | **Parse tree** | `T_JsonTable` as a FROM item | | **Current behavior** | Rejected: `"Unsupported FROM item node type: T_JsonTable"` | | **Severity** | **P2** — new SQL-standard feature | | **Impact** | Medium — JSON document processing is increasingly important | | **Effort** | 8–12 hours | **Why this is complex:** `JSON_TABLE` is a row-generating construct similar to `LATERAL`, but its internal structure (columns, paths, nested paths) requires dedicated parsing. The delta strategy would be row-scoped recomputation similar to `LateralFunction`. ```sql -- Currently rejected: SELECT jt.* FROM api_responses r, JSON_TABLE(r.body, '$.results[*]' COLUMNS ( id INT PATH '$.id', name TEXT PATH '$.name', score NUMERIC PATH '$.score' )) AS jt; ``` --- ### G5.2 — SQL/JSON Path Expressions (`@?`, `@@`) | Field | Value | |-------|-------| | **Gap** | JSON path operators `@?` and `@@` work as `AEXPR_OP` — functional but opaque | | **Current behavior** | Works correctly as binary operators | | **Severity** | **P4** — actually works, just isn't treated intelligently | | **Impact** | None — operational | | **Note** | Not a real gap — listed for documentation completeness | --- ### G5.3 — `MERGE` Statement as Defining Query | Field | Value | |-------|-------| | **Gap** | `MERGE INTO ... USING ... WHEN MATCHED/NOT MATCHED` cannot be used as a defining query | | **Current behavior** | Only `SELECT` statements are accepted as defining queries | | **Severity** | **P4** — MERGE is a DML statement, not a query | | **Impact** | None — MERGE is used internally by pg_trickle for delta application | | **Note** | Not a real gap — defining queries are SELECT statements by design | --- ### G5.4 — Virtual Generated Columns (PostgreSQL 18) | Field | Value | |-------|-------| | **Gap** | Virtual generated columns have no storage but may appear in trigger `NEW` records | | **Current behavior** | Untested. The CDC trigger references `NEW."col_name"` for all columns from `pg_attribute`. Virtual columns (PG 18: `attgenerated = 'v'`) have no stored value — reading them in a BEFORE trigger returns `NULL`, but in an AFTER trigger returns the computed value. | | **Severity** | **P2** — could silently produce wrong CDC data | | **Impact** | Low — virtual generated columns are new in PG 18 | | **Effort** | 2–3 hours | | **Plan** | Filter out virtual generated columns (`attgenerated = 'v'`) from CDC trigger column lists. Instead, recompute from the expression in the stream table if needed. | --- ### G5.5 — `RETURNING` Clause Improvements (PostgreSQL 18) | Field | Value | |-------|-------| | **Gap** | `RETURNING` expressions on DML are not relevant | | **Current behavior** | N/A — defining queries are SELECT-only | | **Severity** | **P4** — not a gap | | **Note** | Listed for completeness. PG 18's enhanced RETURNING is used internally by MERGE but transparent. | --- ## 9. Gap Category 6: CDC & Change Tracking Edge Cases ### G6.1 — Partitioned Tables as Sources | Field | Value | |-------|-------| | **Gap** | Row-level triggers on a partitioned table may not fire for rows routed to child partitions | | **Current behavior** | The trigger is attached to the table OID from `pg_class`. For declaratively partitioned tables, DML on the parent is routed to partitions. Before PostgreSQL 13, AFTER triggers on the parent did not fire for partition-routed rows. PostgreSQL 13+ fires parent triggers for partition-routed rows, but the trigger's `TG_TABLE_NAME` and `TG_RELID` still reference the parent. | | **Severity** | **P0 — Silent data loss** (PostgreSQL 12 and earlier); **P4** in PostgreSQL 13+ | | **Impact** | High for PG <13: changes silently missed. Low for PG 13+: likely works correctly. | | **Effort** | 3–4 hours | | **Target PG version** | pg_trickle targets PG 18, so this is likely **not a real problem** in practice | **Verification needed:** Confirm that with PG 18 + pgrx 0.17, AFTER triggers on partitioned table parents correctly fire for all routed DML. If so, close this gap with a documentation note. **For PG 13+:** The trigger fires correctly, but the change buffer table uses the **parent's** OID for its name (`pgtrickle_changes.changes_`). This should be correct since the stream table's `source_oid` also references the parent. --- ### G6.2 — Logical Replication Targets | Field | Value | |-------|-------| | **Gap** | Tables receiving data via logical replication do not fire normal triggers | | **Current behavior** | No detection — if a source table is a logical replication target, changes arriving via replication are invisible to CDC triggers | | **Severity** | **P1 — Silent data staleness** | | **Impact** | Low-Medium — affects logical replication setups | | **Effort** | 2 hours for detection + warning | | **Plan** | At stream table creation, check if source tables have active subscriptions (`pg_subscription_rel`) and emit a WARNING if so | --- ### G6.3 — TOAST Column Bloat in Change Buffers | Field | Value | |-------|-------| | **Gap** | Large text/bytea columns are stored in full in the change buffer table | | **Current behavior** | CDC triggers copy `NEW."col"` values directly into the change buffer. For columns with large TOAST values (>8KB), this forces de-TOASTing and full storage, potentially causing significant buffer table bloat. | | **Severity** | **P2** — performance/storage issue, not correctness | | **Impact** | Medium — tables with large text/JSON columns will have oversized change buffers | | **Effort** | 4–6 hours | | **Mitigation** | Only store columns that are in `columns_used` for the dependency. Currently all columns are captured regardless. or use content hash of TOAST columns. | --- ### G6.4 — Row-Level Security on Change Buffer Tables | Field | Value | |-------|-------| | **Gap** | If RLS is enabled on the `pgtrickle_changes` schema, CDC trigger inserts could fail | | **Current behavior** | Change buffer tables are owned by the extension installer. RLS is not explicitly disabled on them. If a DBA enables RLS globally or on the `pgtrickle_changes` schema, trigger-based inserts may be blocked. | | **Severity** | **P4** — extremely unlikely scenario | | **Impact** | Very low | | **Effort** | 1 hour — add `ALTER TABLE ... DISABLE ROW LEVEL SECURITY` to buffer table creation | --- ## 10. Gap Category 7: Correctness Blind Spots ### G7.1 — Volatile Expressions Inside `Expr::Raw` | Field | Value | |-------|-------| | **Gap** | The volatility checker only walks `Expr::FuncCall` and `Expr::BinaryOp` nodes. Expressions that deparse to `Expr::Raw` (CASE, COALESCE, BETWEEN, ARRAY[], ROW(), etc.) are opaque to the checker. | | **Current behavior** | A volatile function wrapped in CASE, COALESCE, or other complex expressions is silently accepted in DIFFERENTIAL mode | | **Severity** | **P0 — Silent correctness gap** | | **Impact** | Medium — `CASE WHEN random() > 0.5 THEN ...` or `COALESCE(gen_random_uuid(), ...)` bypass volatile detection | | **Effort** | 6–8 hours | **Root cause:** `collect_volatilities()` at `parser.rs` line ~1381 walks `Expr` nodes recursively but has an implicit pass-through for `Expr::Raw` — it only descends into `FuncCall` args and `BinaryOp` operands. **Implementation options:** | Option | Approach | Effort | Accuracy | |--------|----------|--------|----------| | **A. Re-parse `Expr::Raw` strings** | Run `pg_parse_query()` on the SQL text inside `Expr::Raw` to get back a parse tree, then walk that for `FuncCall` nodes | 6–8 hours | High — catches all functions | | **B. Walk the original parse tree** | Move volatility checking to operate on the raw parse tree (before `Expr` conversion) | 8–10 hours | Perfect — operates on full information | | **C. Regex scan for function calls** | Heuristic: scan `Expr::Raw` text for `funcname(` patterns and look those up in `pg_proc` | 2–3 hours | Low — misses edge cases, false positives | **Recommendation:** Option A. Re-parsing `Expr::Raw` strings is reliable and reuses existing infrastructure. The pg_parse_query API is already available via pgrx's raw parse tree access. ```sql -- Currently NOT detected as volatile: SELECT CASE WHEN random() > 0.5 THEN 'heads' ELSE 'tails' END AS coin FROM t; SELECT COALESCE(gen_random_uuid()::text, 'default') AS id FROM t; SELECT ARRAY[random(), random()] AS arr FROM t; ``` --- ### G7.2 — Operator Volatility Not Checked | Field | Value | |-------|-------| | **Gap** | Custom operators have underlying `oprcode` functions with their own volatility, but these are never checked | | **Current behavior** | Only function-call volatility is checked. Operators like PostGIS's `&&` (bounding box overlap) or custom volatile operators are assumed safe. | | **Severity** | **P1 — Potential silent correctness gap** | | **Impact** | Low — custom volatile operators are extremely rare | | **Effort** | 3–4 hours | | **Plan** | For `Expr::BinaryOp`, look up operator in `pg_operator` → `oprcode` → `pg_proc.provolatile` | --- ### G7.3 — Schema-Qualified Column Reference Ambiguity | Field | Value | |-------|-------| | **Gap** | 3-field `ColumnRef` (`schema.table.column`) silently strips the schema qualifier | | **Current behavior** | `node_to_expr()` for 3-field ColumnRef emits `Expr::ColumnRef { table_alias: table, column_name }`, dropping the schema prefix. If a query references `schema1.t.x` and `schema2.t.x`, both become `t.x` in the generated delta SQL. | | **Severity** | **P2 — Potential ambiguity** | | **Impact** | Very low — self-joins between same-named tables in different schemas are extremely rare | | **Effort** | 2–3 hours | | **Plan** | When multiple FROM items share the same table name but different schemas, use schema-qualified names or synthetic aliases in delta SQL | --- ## 11. Gap Category 8: Operational & Concurrency Gaps ### G8.1 — Cross-Session MERGE Cache Staleness | Field | Value | |-------|-------| | **Gap** | MERGE template cache is thread-local (per PostgreSQL backend). If session A alters a stream table's defining query, session B's cached MERGE template is stale. | | **Current behavior** | Session B continues to use the old MERGE SQL until its cache is invalidated by a refresh error or session reconnection | | **Severity** | **P2** — affects multi-backend deployments | | **Impact** | Low — `ALTER STREAM TABLE` is infrequent; the stale cache is invalidated on the next DDL event in session B's scope | | **Effort** | 4–6 hours | | **Plan** | Use `pg_notify` to broadcast cache invalidation events, or add a catalog version counter that backends check before each refresh | --- ### G8.2 — Missing `ALTER FUNCTION` / `ALTER OPERATOR` Tracking | Field | Value | |-------|-------| | **Gap** | DDL hooks don't track changes to functions or operators referenced in defining queries | | **Current behavior** | If a user replaces a function (`CREATE OR REPLACE FUNCTION`) or drops a custom operator used in a defining query, the stream table silently uses the new function semantics or fails on next refresh | | **Severity** | **P2** — function replacement silently changes stream table semantics | | **Impact** | Low — most functions in defining queries are PostgreSQL built-ins | | **Effort** | 4–6 hours | | **Plan** | Track function OIDs referenced in defining queries. On `ddl_command_end` for `CREATE/ALTER/DROP FUNCTION`, check if any tracked function OID was affected and trigger reinit. | --- ### G8.3 — Deferred Cleanup After Session Crash | Field | Value | |-------|-------| | **Gap** | Change buffer cleanup is deferred to the next refresh cycle. If a session crashes between refresh and cleanup, the cleanup queue is lost. | | **Current behavior** | The change buffer grows until the next successful refresh in any session, which then cleans up using LSN range predicates. Not a correctness issue — stale rows are harmless — but causes WAL and storage bloat. | | **Severity** | **P4** — operational consideration | | **Impact** | Low — self-healing on next refresh | | **Effort** | 2–3 hours — add a background worker sweep or startup cleanup hook | --- ### G8.4 — Advisory Lock Contention | Field | Value | |-------|-------| | **Gap** | No explicit advisory lock to prevent duplicate concurrent refreshes of the same stream table | | **Current behavior** | The `shmem` coordination layer prevents most duplicate refreshes via shared memory flags, but there's a small race window between checking the flag and starting the refresh. | | **Severity** | **P4** — theoretical; shmem coordination handles >99% of cases | | **Impact** | Very low — double-refresh is idempotent (correct but wasteful) | | **Effort** | 1–2 hours — add `pg_advisory_xact_lock()` at refresh entry point | --- ## 12. Gap Category 9: Documentation Drift Since SQL_GAPS_5, several features have been implemented but documentation has not been updated to reflect the new capabilities. These are **not code bugs** but documentation inaccuracies that mislead users. ### D1. SQL_REFERENCE.md — Features Documented as Rejected That Are Now Supported | Feature | Doc Says | Actual State | Priority | |---------|----------|-------------|----------| | **NATURAL JOIN** | "Rejected with rewrite suggestion" | **Supported** — parser resolves common columns | **P3** | | **DISTINCT ON** | "Rejected with rewrite suggestion" | **Supported** — auto-rewritten to ROW_NUMBER() | **P3** | | **GROUPING SETS/CUBE/ROLLUP** | "Rejected with rewrite suggestion" | **Supported** — auto-rewritten to UNION ALL | **P3** | | **SubLinks in OR** | "Rejected" | **Supported** — auto-rewritten to UNION | **P3** | ### D2. All Docs — Missing 12 Regression/Correlation Aggregates | Doc | Issue | |-----|-------| | SQL_REFERENCE.md | Lists 24 aggregates; missing CORR, COVAR_POP/SAMP, REGR_* (12 functions) | | DVM_OPERATORS.md | Aggregate table lists 24; missing same 12 | ### D3. ARCHITECTURE.md — Stale Operator Table | Issue | Detail | |-------|--------| | Operator count | Lists 14 operators; actual count is 21 | | Missing operators | LateralSubquery, SemiJoin, AntiJoin, ScalarSubquery, FullJoin, RecursiveSelfRef, join_common | | Module map | Missing 6+ operator source files | | CDC description | Omits TRUNCATE trigger | ### D4. CONFIGURATION.md — Missing GUC Sections | GUC | Status | |-----|--------| | `pg_trickle.differential_max_change_ratio` | Exists in code; no dedicated section in docs | | `pg_trickle.cleanup_use_truncate` | Exists in code; no dedicated section in docs | | `pg_trickle.merge_planner_hints` | Exists in code; no dedicated section in docs | | `pg_trickle.merge_work_mem_mb` | Exists in code; no dedicated section in docs | | `pg_trickle.merge_strategy` | Exists in code; no dedicated section in docs | | `pg_trickle.use_prepared_statements` | Exists in code; no dedicated section in docs | **Actual GUC count:** 17 (not the 12 documented). ### D5. Missing Feature Documentation | Feature | Implemented | Not Documented | |---------|------------|----------------| | HAVING clause | Full support | Never mentioned | | Keyless table support | All-column content hash | Not documented in SQL_REFERENCE | | Volatile function detection | Rejects in DIFF, warns for stable | Not documented | | Schema snapshots + fingerprints | columns_used, schema_fingerprint | Not in ARCHITECTURE | | Auto-rewrite pipeline (5 passes) | Fully operational | Not documented as a system | | Scalar subquery in WHERE rewrite | Auto-rewritten to CROSS JOIN | Not documented | | Multi-PARTITION BY rewrite | Auto-rewritten to nested subqueries | Not documented | ### D6. DVM_OPERATORS.md — Cross-Document Contradiction DVM_OPERATORS.md correctly states "NATURAL JOIN is supported — common columns are resolved automatically" while SQL_REFERENCE.md says "Rejected." DVM_OPERATORS.md is correct. --- ## 13. Gap Category 10: Intentional Rejections (Not Gaps) These items are **permanently rejected by design** — they are not gaps to be fixed. Listed here for completeness and to prevent re-evaluation. | Construct | Reason | Error Message | |-----------|--------|---------------| | `LIMIT` without `ORDER BY` | Undefined ordering — rejected | "LIMIT is not supported in defining queries." | | `OFFSET` | Stream tables materialize full result sets — rejected | "OFFSET is not supported in defining queries." | | `ORDER BY` + `LIMIT` (TopK) | ✅ **Now supported** — scoped recomputation via MERGE | Accepted (not an error) | | `FETCH FIRST` / `FETCH NEXT` | Same as LIMIT — TopK if ORDER BY present, rejected otherwise | Same as LIMIT | | `FOR UPDATE` / `FOR SHARE` / `FOR NO KEY UPDATE` / `FOR KEY SHARE` | Row-level locking has no meaning on materialized stream tables | "FOR UPDATE/FOR SHARE is not supported." | | `TABLESAMPLE` | Non-deterministic; stream tables are complete result sets | "TABLESAMPLE is not supported." | | `ROWS FROM()` with multiple functions | Extremely niche | "ROWS FROM() with multiple functions is not supported." | | `LATERAL` with `RIGHT`/`FULL JOIN` | PostgreSQL itself restricts this | "Only INNER JOIN LATERAL and LEFT JOIN LATERAL are supported." | | `XMLAGG` | Extremely niche XML aggregation | Recognized but rejected for DIFFERENTIAL mode | | Hypothetical-set aggregates | Almost always used as window functions, not aggregates | Recognized but rejected for DIFFERENTIAL mode | | Window functions inside expressions | Architectural constraint; requires separate column | "Window functions nested inside expressions..." | | `ORDER BY` | Accepted but silently discarded — row order is undefined | No error (intentional no-op) | | Direct DML on stream tables | Refresh engine manages all writes | Triggers prevent manual DML | | Direct DDL on storage tables | Managed by extension | DDL hooks warn/block | | `SELECT` without `FROM` | Constant-only queries have no CDC source | "Defining query must have FROM clause" | --- ## 14. Prioritized Implementation Roadmap ### Tier 0 — Critical Correctness (Must Fix) | Step | Gap | Description | Effort | Impact | |------|-----|-------------|--------|--------| | ~~**F1**~~ | ~~G2.1~~ | ~~Views as sources: view inlining auto-rewrite~~ | ~~Done~~ | ~~✅ Implemented~~ | | ~~**F2**~~ | ~~G7.1~~ | ~~Volatile expressions in `Expr::Raw`: re-parse for volatility checking~~ | ~~Done~~ | ~~✅ Implemented~~ | | ~~**F3**~~ | ~~G3.1~~ | ~~User-defined aggregates: detect via `pg_proc.prokind` and reject~~ | ~~Done~~ | ~~✅ Implemented~~ | **Estimated effort:** 12–16 hours **Value:** Closes all remaining P0 and P1 silent correctness gaps. ### Tier 1 — High-Value Enhancements | Step | Gap | Description | Effort | Impact | |------|-----|-------------|--------|--------| | ~~**F4**~~ | ~~G1.1~~ | ~~COLLATE expression support~~ | ~~Done~~ | ~~✅ Implemented~~ | | **F5** | G1.4 | `IS JSON` predicate (PG 16+) | 2–3 hours | Enables JSON validation | | ~~**F6**~~ | ~~G3.2~~ | ~~`ANY_VALUE` aggregate (PG 16+)~~ | ~~Done~~ | ~~✅ Implemented~~ | | ~~**F7**~~ | ~~G5.4~~ | ~~Virtual generated columns (PG 18)~~ | ~~Done~~ | ~~✅ Implemented~~ | | ~~**F8**~~ | ~~G2.3~~ | ~~Foreign tables: detect and reject clearly~~ | ~~Done~~ | ~~✅ Implemented~~ | | ~~**F9**~~ | ~~G2.2~~ | ~~Materialized views: reject in DIFFERENTIAL~~ | ~~Incl. in F1~~ | ~~✅ Implemented~~ | **Estimated effort:** 7–12 hours **Value:** Covers PostgreSQL 16–18 new features and improves error clarity. ### Tier 2 — SQL/JSON Ecosystem | Step | Gap | Description | Effort | Impact | |------|-----|-------------|--------|--------| | **F10** | G1.5 | SQL/JSON constructors (JSON(), JSON_ARRAY(), etc.) | 4–6 hours | ~~✅ Implemented~~ | | **F11** | G3.4 | JSON_ARRAYAGG / JSON_OBJECTAGG (SQL standard) | 4–6 hours | ~~✅ Implemented~~ | | **F12** | G5.1 | JSON_TABLE() in FROM | 8–12 hours | ~~✅ Implemented~~ | **Estimated effort:** 16–24 hours **Value:** Full SQL/JSON standard compliance (PostgreSQL 16+). ### Tier 3 — Operational Hardening | Step | Gap | Description | Effort | Impact | |------|-----|-------------|--------|--------| | **F13** | G6.1 | Verify partitioned table CDC on PG 18 | 3–4 hours | ~~✅ Implemented~~ | | **F14** | G6.2 | Detect logical replication targets | 2 hours | ~~✅ Implemented~~ | | **F15** | G6.3 | Selective CDC column capture | 4–6 hours | Deferred (needs column lineage) | | **F16** | G7.2 | Operator volatility checking | 3–4 hours | ~~✅ Implemented~~ | | **F17** | G8.1 | Cross-session cache invalidation | 4–6 hours | ~~✅ Implemented~~ | | **F18** | G8.2 | Function/operator DDL tracking | 4–6 hours | ~~✅ Implemented~~ | **Estimated effort:** 20–28 hours **Value:** Production hardening for complex deployments. ### Tier 4 — Documentation Sync | Step | Gap | Description | Effort | |------|-----|-------------|--------| | **F19** | D1 | Update SQL_REFERENCE.md — move 4 features from rejected to supported | 1 hour | | **F20** | D2 | Document 12 regression/correlation aggregates across all docs | 1 hour | | **F21** | D3 | Update ARCHITECTURE.md operator table and module map | 1 hour | | **F22** | D4 | Add 7 missing GUC sections to CONFIGURATION.md | 2 hours | | **F23** | D5 | Document HAVING, keyless tables, volatile detection, auto-rewrites | 2 hours | | **F24** | D6 | Resolve DVM_OPERATORS.md / SQL_REFERENCE.md contradiction | 0.5 hours | **Estimated effort:** 7–8 hours **Value:** Accurate documentation for users and contributors. ### Summary | Tier | Steps | Effort | Cumulative | |------|-------|--------|------------| | 0 — Critical | F1–F3 | 12–16h | 12–16h | | 1 — High Value | F4–F9 | 7–12h | 19–28h | | 2 — SQL/JSON | F10–F12 | 16–24h | 35–52h | | 3 — Operational | F13–F18 | 20–28h | 55–80h | | 4 — Documentation | F19–F24 | 7–8h | 62–88h | | **Total** | **24 steps** | **62–88h** | — | ### Recommended Execution Order ``` Session 1: F1 (views) + F3 (user-defined aggs) + F8 (foreign tables) ✅ Done Session 2: F7 (virtual gen cols) + F4 (COLLATE) + F6 (ANY_VALUE) ✅ Done Session 3: F2 (volatile in Expr::Raw) ✅ Done Session 4: F19–F24 (all documentation) ✅ Done Session 5: F13 (partitioned tables) + F14 (replication targets) ✅ Done Session 6: F5 (IS JSON) + F10 (SQL/JSON constructors) ✅ Done Session 7: F11 (JSON agg standard) + F12 (JSON_TABLE) ✅ Done Session 8+: F15–F18 (operational hardening) ✅ Done (F15 implemented in v0.9.0) ``` **F15 (v0.9.0 implementation note):** F15 is now implemented end-to-end in v0.9.0. `source_columns_used()` in the parser already walks all ColumnRef/expression nodes and returns per-source column lists. `StDependency::union_referenced_columns_for_source()` in `src/catalog.rs` computes the union across all downstream STs. `cdc::resolve_referenced_column_defs()` filters the full column list to referenced ∪ PK, and this is wired into `setup_cdc_for_source` (creation) and both `rebuild_cdc_trigger*` functions (schema-change rebuild). Two items remain before F15 is fully closed: 1. **Integration test** — verify the change buffer table has only the expected columns. 2. **Monitoring view** — expose `is_selective_capture_active` via the monitoring SQL API. --- ## 15. Historical Progress Summary | Plan | Phase | Sessions | Items Resolved | Test Growth | |------|-------|----------|---------------|-------------| | SQL_GAPS_1 | Expression deparsing, CROSS JOIN, FOR UPDATE rejection | 1 | 6 | 745 → 757 | | SQL_GAPS_2 | GROUPING SETS P0, GROUP BY hardening, TABLESAMPLE | 1 | 6 | 745 → 750 | | SQL_GAPS_3 | Aggregates (5 new), subquery operators (Semi/Anti/Scalar Join) | ~5 | 10 | 750 → 809 | | SQL_GAPS_4 | Report accuracy, ordered-set aggregates (MODE, PERCENTILE) | 1 | 7 | 809 → 826 | | Hybrid CDC | Trigger→WAL transition, user triggers, pgt_ rename | ~3 | 12+ | 826 → 872 | | SQL_GAPS_5 | 15 steps: volatile detection, DISTINCT ON, GROUPING SETS, ALL subquery, regression aggs, mixed UNION, TRUNCATE, schema infra, NATURAL JOIN, keyless tables, scalar WHERE, SubLinks-OR, multi-PARTITION, recursive CTE DIFF | ~10 | 15 | 872 → 896 | | **SQL_GAPS_6** | **This plan: 38 new gaps identified, 24 steps proposed** | **8** | **23/24 done (F15 deferred)** | 896 → 1,138+ | ### Cumulative Scorecard | Metric | SQL_GAPS_1 | SQL_GAPS_3 | SQL_GAPS_5 End | SQL_GAPS_6 (Now) | |--------|-----------|-----------|----------------|-----------------| | AggFunc variants | 5 | 10 | 36 | 39 | | OpTree variants | 12 | 18 | 21 | 21 | | Diff operators | 10 | 16 | 21 | 21 | | Auto-rewrite passes | 0 | 0 | 5 | 5 | | Unit tests | 745 | 809 | 896 | ~1,138 | | E2E tests | ~100 | ~200 | 350 | 384 | | E2E test files | ~15 | ~18 | 22 | 22 | | P0 issues | 14 | 0 | 0 | 0 | | P1 issues | 5 | 0 | 0 | 0 | | Expression types | 7 | 15 | 30+ | 40+ | | GUCs | ~6 | ~8 | 17 | 17 | ### What Changed Between SQL_GAPS_5 and SQL_GAPS_6 SQL_GAPS_5 focused on closing syntax-level gaps — every SQL construct that pg_trickle could encounter was either handled or rejected with a clear error. SQL_GAPS_6 shifts focus to **operational boundaries**: - **Source object types**: Views, materialized views, foreign tables, partitioned tables - **PostgreSQL version-specific features**: PG 16 SQL/JSON, PG 17 JSON_TABLE, PG 18 virtual generated columns - **Type system edge cases**: Lossy `::text` casts, enum modifications, composite type changes - **Correctness blind spots**: `Expr::Raw` opacity to volatility checking, operator volatility, cross-session cache coherence - **CDC limitations**: TOAST bloat, replication targets, RLS interactions - **Documentation drift**: 6 categories of stale docs from rapid feature development This represents a maturation from "does the SQL parser handle this syntax?" to "does the complete system behave correctly in all production scenarios?"