# Plan: Last Differential Mode Gaps

Date: 2026-03-15
Status: Complete — G1 + G3 fully implemented (including De Morgan + multi-pass); G2 was done in a prior release

---

## 1. Problem Statement

pg_trickle's DIFFERENTIAL refresh mode covers a very wide range of SQL
constructs. However, three categories of queries still silently fall back to
FULL when `AUTO` mode is selected (or produce hard errors when explicit
`DIFFERENTIAL` is requested):

| # | Gap | Estimated Frequency | Effort |
|---|-----|---------------------|--------|
| G1 | User-defined aggregates (UDAs) | Medium (analytics, PostGIS, pgvector) | Medium |
| G2 | Window functions nested in expressions | Medium (ranking/pagination queries) | Medium |
| G3 | Sublinks in deeply nested OR | Low (complex filter trees) | Small–Medium |

All three gaps share a common trait: the DVM engine already has the
infrastructure to handle them (group-rescan for aggregates, partition
recomputation for windows, UNION rewrite for OR+sublinks), but the parser
rejects the patterns before that infrastructure can be applied.

### Out of Scope

The following FULL-only patterns are **fundamentally** incompatible with
differential semantics and are not addressed here:

- **Materialized views / foreign tables as sources** — no CDC triggers.
- **VOLATILE functions** (`random()`, `clock_timestamp()`) — non-deterministic.
- **TABLESAMPLE** — non-deterministic sampling.
- **FOR UPDATE / FOR SHARE** — row-level locking has no differential meaning.
- **LIMIT without ORDER BY** — non-deterministic row selection.
- **LATERAL with RIGHT/FULL JOIN** — rejected by PostgreSQL itself (lateral
  reference dependency conflicts with RIGHT/FULL semantics).

---

## 2. Current State

### How AUTO Fallback Works

`validate_and_parse_query()` in `src/api.rs` (L540) runs five sequential
checkpoints. If any checkpoint fails and mode is `AUTO`, it downgrades to
`FULL`:

1. `reject_unsupported_constructs()` — TABLESAMPLE, FOR UPDATE, nested OR
   sublinks
2. `reject_materialized_views()` — matviews, foreign tables
3. `validate_immediate_mode_support()` — IMMEDIATE-specific checks
4. `parse_defining_query_full()` + `check_ivm_support_with_registry()` — parse
   into OpTree; any failure → FULL
5. Volatility check — VOLATILE → hard error; STABLE → warning

Gaps G1, G2, and G3 all fail at checkpoint 4 — the OpTree parser rejects
them, and AUTO gracefully falls back to FULL.

### Key Code Locations

| Component | File | Entry Point |
|-----------|------|-------------|
| AUTO fallback orchestrator | `src/api.rs:540` | `validate_and_parse_query()` |
| Aggregate classification | `src/dvm/parser.rs:13558` | `is_known_aggregate()` |
| UDA detection | `src/dvm/parser.rs:13534` | `is_user_defined_aggregate()` |
| Aggregate extraction | `src/dvm/parser.rs:13772` | `extract_aggregates()` rejection |
| Window-in-expr detection | `src/dvm/parser.rs:12787` | `node_contains_window_func()` |
| Window-in-expr rejection | `src/dvm/parser.rs:13166` | `extract_window_exprs()` |
| Window diff operator | `src/dvm/operators/window.rs` | `diff_window()` |
| Aggregate diff operator | `src/dvm/operators/aggregate.rs` | `diff_aggregate()` |
| OR+sublink rewrite | `src/dvm/parser.rs:6242` | `rewrite_sublinks_in_or()` |
| AND+OR+sublink rewrite | `src/dvm/parser.rs:6395` | `rewrite_and_with_or_sublinks()` |
| OR+sublink rejection (stage 2) | `src/dvm/parser.rs:8527` | `extract_where_sublinks()` |
| AggFunc enum | `src/dvm/parser.rs:164` | `AggFunc` variants |
| Group-rescan flag | `src/dvm/parser.rs:342` | `AggFunc::is_group_rescan()` |

---

## 3. Gap Details & Implementation Plans

---

### G1: User-Defined Aggregates (UDAs)

#### Problem

Any aggregate function not in the hard-coded `is_known_aggregate()` list
(~45 built-in functions) is rejected from DIFFERENTIAL mode. This affects:

- **PostGIS**: `ST_Union()`, `ST_Collect()`, `ST_Extent()`, `ST_MakeLine()`
- **pgvector**: `avg()` on vector types (custom aggregate)
- **pg_trgm / hunspell**: custom text aggregates
- **User-created aggregates**: `CREATE AGGREGATE my_percentile(...)`

The current rejection is at `src/dvm/parser.rs` L13772:

```rust
} else if is_user_defined_aggregate(&name_lower) {
    return Err(PgTrickleError::UnsupportedOperator(format!(
        "User-defined aggregate function {func_name}() is not supported \
         in DIFFERENTIAL mode. ..."
    )));
```

#### Insight

The DVM's **group-rescan** strategy already handles many complex aggregates
(STRING_AGG, ARRAY_AGG, all statistical functions, etc.) by re-aggregating
entire groups from the base data when any row in the group changes. This exact
same strategy can safely support any UDA — it makes no assumptions about the
aggregate's internal semantics.

#### Design

**Approach**: Treat unknown aggregates as group-rescan aggregates. Rather than
rejecting them, classify them as a new `AggFunc::UserDefined { name, schema }`
variant that always uses the group-rescan strategy.

##### Step 1: Extend `AggFunc` Enum

File: `src/dvm/parser.rs`

```rust
pub enum AggFunc {
    // ... existing variants ...
    /// User-defined or unrecognized aggregate — always group-rescan.
    UserDefined {
        name: String,
        schema: Option<String>,
    },
}
```

Update `is_group_rescan()` to return `true` for `UserDefined`.

Update `to_sql_name()` (or equivalent) to emit `schema.name()` or just
`name()` depending on whether schema qualification is present.

##### Step 2: Modify Aggregate Extraction

File: `src/dvm/parser.rs`, function `extract_aggregates()` (~L13772)

Replace the UDA rejection with:

```rust
} else if is_user_defined_aggregate(&name_lower) {
    // Treat as group-rescan: re-aggregate the entire group on any change.
    AggFunc::UserDefined {
        name: name_lower.clone(),
        schema: extract_schema_from_funcname(funcname),
    }
}
```

##### Step 3: Update Aggregate Diff Operator

File: `src/dvm/operators/aggregate.rs`

The group-rescan code path in `diff_aggregate()` already handles group-rescan
aggregates generically — it emits:

```sql
SELECT group_cols, AGG_FUNC(args) FROM source WHERE group_key IN (changed_groups)
```

The `UserDefined` variant needs to:

1. Emit the function call with correct schema qualification.
2. Preserve any special syntax (ORDER BY within aggregate, FILTER clause,
   DISTINCT keyword).
3. Handle the case where the UDA uses ordered-set syntax
   (`WITHIN GROUP (ORDER BY ...)`).

Review `render_agg_call()` (or equivalent) to ensure it can emit arbitrary
function names rather than only hard-coded ones.

##### Step 4: Handle Edge Cases

- **Schema-qualified UDAs**: `myschema.my_agg(x)` — preserve schema in
  `AggFunc::UserDefined`.
- **Overloaded names**: A UDA with the same name as a built-in (unlikely but
  possible if schema-qualified) — prefer the built-in classification if
  unqualified, use `UserDefined` if schema-qualified to a non-pg_catalog
  schema.
- **Aggregate with FILTER**: `my_agg(x) FILTER (WHERE cond)` — the existing
  FILTER handling in the aggregate operator should work unchanged since
  group-rescan re-evaluates the full expression.
- **Aggregate with ORDER BY**: `string_agg(x, ',' ORDER BY y)` — ensure the
  ORDER BY clause is preserved in the rendered SQL. The existing group-rescan
  path already handles this for STRING_AGG; verify it generalizes.
- **Ordered-set aggregates**: `my_percentile(0.5) WITHIN GROUP (ORDER BY x)` —
  ensure `WITHIN GROUP` clause is preserved. The existing PERCENTILE_CONT
  handling is a reference implementation.

##### Step 5: Catalog Validation

Add a lightweight check at stream table creation time that verifies the UDA
exists in `pg_proc` with `prokind = 'a'`. This is already done by
`is_user_defined_aggregate()` — reuse it. If the UDA is dropped later, the
refresh will fail at execution time with a clear PostgreSQL error (function
does not exist), which is acceptable.

##### Step 6: Tests

| Test | Description |
|------|-------------|
| Unit: `test_uda_classified_as_group_rescan` | Verify `AggFunc::UserDefined` returns `is_group_rescan() == true` |
| Unit: `test_uda_sql_rendering` | Verify schema-qualified and unqualified UDA SQL emission |
| E2E: `test_uda_simple_differential` | CREATE AGGREGATE → use in stream table → verify DIFFERENTIAL mode |
| E2E: `test_uda_with_filter_clause` | UDA with FILTER (WHERE ...) |
| E2E: `test_uda_with_order_by` | UDA with ORDER BY inside aggregate |
| E2E: `test_uda_ordered_set` | UDA using WITHIN GROUP syntax |
| E2E: `test_uda_schema_qualified` | Schema-qualified UDA `myschema.my_agg()` |
| E2E: `test_uda_insert_delete_update` | Full CDC cycle with UDA in GROUP BY query |
| E2E: `test_uda_multiple_in_same_query` | Multiple UDAs in same SELECT |
| E2E: `test_uda_combined_with_builtin` | UDA + COUNT + SUM in same query |

##### Effort Estimate

- Parser changes: ~60 lines
- Aggregate operator changes: ~30 lines (mostly rendering)
- Tests: ~300 lines
- Risk: Low — group-rescan is a proven strategy; no new delta math required

---

### G2: Window Functions Nested in Expressions

#### Problem

Window functions that appear inside expressions are rejected:

```sql
-- All rejected:
SELECT RANK() OVER (...) + 1 AS adjusted_rank FROM t
SELECT CASE WHEN ROW_NUMBER() OVER (...) <= 10 THEN 'top' ELSE 'rest' END FROM t
SELECT COALESCE(LAG(value) OVER (...), 0) AS prev_value FROM t
SELECT CAST(ROW_NUMBER() OVER (...) AS text) FROM t
```

The rejection is at `src/dvm/parser.rs` L13166:

```rust
if unsafe { node_contains_window_func(rt.val) } {
    return Err(PgTrickleError::UnsupportedOperator(
        "Window functions nested inside expressions ..."
    ));
}
```

The DVM's window operator (`src/dvm/operators/window.rs`) already uses
**partition-based recomputation**: when any row in a partition changes, the
entire partition's window results are recomputed. This means the operator
doesn't need to understand the expression wrapping the window function — it
just needs to recompute the full SELECT expression for affected partitions.

#### Design

**Approach**: Auto-rewrite the defining query to extract nested window
functions into an inner subquery, then apply the outer expression in a wrapper.

**Strategy**: Two-layer CTE rewrite at parse time.

##### Step 1: Detect Window-in-Expression Patterns

File: `src/dvm/parser.rs`

The existing `node_contains_window_func()` already detects these patterns.
Add a companion function `extract_window_from_expression()` that:

1. Walks the expression tree.
2. For each window function found, replaces it with a synthetic column
   reference (`__pgt_win_N`).
3. Returns a list of `(synthetic_name, window_function_sql)` pairs plus the
   rewritten outer expression.

Example:

```sql
-- Input target:
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) + 1 AS adjusted_rank

-- Extracted:
--   inner column: RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS __pgt_win_1
--   outer expression: __pgt_win_1 + 1 AS adjusted_rank
```

##### Step 2: Rewrite Query with Inner Subquery

File: `src/dvm/parser.rs`, new function `rewrite_nested_window_functions()`

Called from the query rewrite pipeline (`run_query_rewrite_pipeline()` in
`src/api.rs`), after existing rewrites but before DVM parsing.

Transform:

```sql
-- Input:
SELECT id,
       RANK() OVER (PARTITION BY dept ORDER BY salary DESC) + 1 AS adjusted_rank,
       COALESCE(LAG(salary) OVER (PARTITION BY dept ORDER BY hire_date), 0) AS prev_sal,
       name
FROM employees
WHERE active = true

-- Output:
SELECT "__pgt_winbase"."id",
       "__pgt_winbase"."__pgt_win_1" + 1 AS "adjusted_rank",
       COALESCE("__pgt_winbase"."__pgt_win_2", 0) AS "prev_sal",
       "__pgt_winbase"."name"
FROM (
    SELECT id, name,
           RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS "__pgt_win_1",
           LAG(salary) OVER (PARTITION BY dept ORDER BY hire_date) AS "__pgt_win_2"
    FROM employees
    WHERE active = true
) AS "__pgt_winbase"
```

The inner subquery becomes a standard window function query (already
supported), and the outer SELECT is a simple projection (also supported).

##### Step 3: Handle Edge Cases

- **Multiple window functions in one expression**:
  `RANK() OVER w1 - DENSE_RANK() OVER w2` → extract both as separate
  `__pgt_win_N` columns. The inner subquery can contain multiple window
  functions.

- **Window function in WHERE clause**: PostgreSQL itself rejects this
  (`ERROR: window functions are not allowed in WHERE`), so no handling needed.

- **Window function in GROUP BY / HAVING**: Also rejected by PostgreSQL.

- **Window function in ORDER BY**: Already handled by existing window support.

- **Nested expressions containing window + aggregate**:
  `SUM(x) + ROW_NUMBER() OVER (...)` — PostgreSQL allows this in SELECT.
  The rewrite must place the aggregate in the inner subquery alongside the
  window function, or more precisely, the inner subquery should preserve
  the original query's GROUP BY and aggregates while adding the window
  columns.

  However, PostgreSQL does not allow window functions and aggregates to
  coexist unless the window function operates on the aggregate result:
  `SUM(x) + ROW_NUMBER() OVER (ORDER BY SUM(x))` is valid. In this case,
  the inner subquery should include the aggregates and the window function,
  and the outer SELECT applies the wrapping expression.

- **DISTINCT + nested window**: The inner subquery preserves all rows (no
  DISTINCT); DISTINCT moves to the outer query if originally present.

- **ORDER BY + LIMIT (TopK)**: If the original query has ORDER BY + LIMIT,
  these stay on the outer query. The inner subquery produces all rows with
  window columns.

- **Aliased window definitions** (`WINDOW w AS (PARTITION BY ...)`): The
  inner subquery must carry the WINDOW clause.

##### Step 4: Parser Integration

The rewrite should happen in `run_query_rewrite_pipeline()` (in `src/api.rs`),
inserted after `rewrite_rows_from()` and before `parse_defining_query_full()`:

```
1. rewrite_scalar_subquery_in_where()
2. rewrite_correlated_scalar_in_select()
3. rewrite_sublinks_in_or()
4. rewrite_rows_from()
5. rewrite_grouping_sets()
6. rewrite_distinct_on()
7. rewrite_nested_window_functions()    ← NEW
8. parse_defining_query_full()
```

Remove the rejection in `extract_window_exprs()` — after the rewrite, window
functions will always appear at the top level of SELECT targets.

##### Step 5: Tests

| Test | Description |
|------|-------------|
| Unit: `test_extract_window_from_expression` | Pure logic: expression tree → synthetic columns + outer expr |
| Unit: `test_rewrite_nested_window_arithmetic` | `RANK() + 1` → inner/outer split |
| Unit: `test_rewrite_nested_window_case` | `CASE WHEN ROW_NUMBER() > 5 THEN ...` |
| Unit: `test_rewrite_nested_window_coalesce` | `COALESCE(LAG(x), 0)` |
| Unit: `test_rewrite_nested_window_cast` | `CAST(ROW_NUMBER() AS text)` |
| Unit: `test_rewrite_nested_window_multiple` | Two window funcs in one expression |
| Unit: `test_rewrite_preserves_plain_windows` | Non-nested windows pass through unchanged |
| E2E: `test_nested_window_rank_plus_one` | Full cycle: create, insert, refresh, verify |
| E2E: `test_nested_window_coalesce_lag` | COALESCE(LAG(...), default) |
| E2E: `test_nested_window_case_row_number` | CASE on ROW_NUMBER for top-N labeling |
| E2E: `test_nested_window_multiple_exprs` | Multiple nested windows in same query |
| E2E: `test_nested_window_with_group_by` | Nested window alongside aggregate |
| E2E: `test_nested_window_insert_delete` | CDC cycle with partition changes |

##### Effort Estimate

- Expression extractor: ~120 lines
- Query rewriter: ~200 lines
- Parser integration: ~20 lines (add call + remove rejection)
- Tests: ~400 lines
- Risk: Medium — expression tree walking and SQL reconstruction is complex; the
  inner/outer split must preserve all clauses correctly. Edge cases around
  aggregate + window interaction need careful handling.

---

### G3: Sublinks in Deeply Nested OR

#### Problem

The two-stage rewrite pipeline handles most OR + sublink patterns:

| Pattern | Status |
|---------|--------|
| `WHERE EXISTS(...) OR x > 5` | ✓ Stage 1 → UNION |
| `WHERE a = 1 AND (EXISTS(...) OR x > 5)` | ✓ Stage 1 → AND-prefix + UNION |
| `WHERE a AND (b AND (c OR EXISTS(...)))` | ✓ AND-flattening rescues this |
| `WHERE (a OR EXISTS(...)) AND (b OR NOT EXISTS(...))` | ✗ Only first OR rewritten |
| `WHERE NOT (a AND NOT EXISTS(...))` | ✗ De Morgan not applied |
| `WHERE a OR (b AND EXISTS(...))` | ✗ Mixed AND/OR under top-level OR |

The failing patterns reach `extract_where_sublinks()` (L8527) with residual
OR+sublink nodes and are rejected:

```rust
if inner_bool.boolop == pg_sys::BoolExprType::OR_EXPR
    && unsafe { node_tree_contains_sublink(arg_ptr) }
{
    return Err(PgTrickleError::UnsupportedOperator(
        "Subquery expressions (EXISTS, IN) inside OR conditions ..."
    ));
}
```

#### Design

**Approach**: Extend the OR+sublink rewrite pipeline to handle additional
nesting patterns through recursive normalization and multi-pass rewriting.

##### Step 1: Recursive OR+Sublink Normalization

File: `src/dvm/parser.rs`

Add a new function `normalize_or_sublinks()` that applies transformations
recursively until a fixed point (no more changes):

**Transform A — Multiple OR+sublink conjuncts**:

Currently `rewrite_and_with_or_sublinks()` only processes the **first**
OR+sublink conjunct it finds. Extend it to process all of them by iterating:

```
WHERE (a OR EXISTS(s1)) AND (b OR NOT EXISTS(s2))

Pass 1: Rewrite first OR → UNION branches, each still has second OR+sublink
Pass 2: Rewrite second OR inside each UNION branch
         (each branch is now a simple SELECT, so top-level OR rewrite applies)

Result:
  (SELECT ... WHERE a AND b) UNION
  (SELECT ... WHERE a AND NOT EXISTS(s2)) UNION
  (SELECT ... WHERE EXISTS(s1) AND b) UNION
  (SELECT ... WHERE EXISTS(s1) AND NOT EXISTS(s2))
```

Implementation: After the first rewrite pass produces a UNION, recursively
apply `rewrite_sublinks_in_or()` to each UNION branch. Cap recursion depth
at 3 to prevent pathological explosion.

**Transform B — Mixed AND/OR under top-level OR**:

```
WHERE a OR (b AND EXISTS(s1))

-- This is already a top-level OR. The issue is that one arm is an AND
-- containing a sublink. The current UNION rewrite handles this correctly
-- because each OR arm becomes a separate UNION branch:
  (SELECT ... WHERE a) UNION (SELECT ... WHERE b AND EXISTS(s1))

-- The second branch has EXISTS under AND, which extract_where_sublinks()
-- handles natively. This should already work.
```

Verify and add a test — this may already be supported but untested.

**Transform C — NOT wrapping OR+sublink (De Morgan)**:

```
WHERE NOT (a AND NOT EXISTS(s1))

-- Apply De Morgan's law:
WHERE NOT a OR EXISTS(s1)

-- Now it's a top-level OR with a sublink → UNION rewrite applies.
```

Add a De Morgan normalization pass that pushes NOT inward through AND/OR
before the UNION rewrite runs. Implementation:

```rust
fn push_not_inward(node: *mut pg_sys::Node) -> Result<*mut pg_sys::Node, PgTrickleError>
```

Rules:
- `NOT (a AND b)` → `(NOT a) OR (NOT b)`
- `NOT (a OR b)` → `(NOT a) AND (NOT b)`
- `NOT NOT a` → `a`
- `NOT EXISTS(s)` → `NOT EXISTS(s)` (leave as-is; already handled)

Only apply when the expression tree contains sublinks inside OR after NOT
push-down would expose them to the UNION rewrite.

##### Step 2: Multi-Pass Pipeline

File: `src/api.rs`, function `run_query_rewrite_pipeline()`

Modify the OR+sublink rewrite to run iteratively:

```rust
// Rewrite sublinks in OR (may need multiple passes for nested patterns)
let mut prev = query.clone();
for _ in 0..3 {
    query = rewrite_sublinks_in_or(&query)?;
    if query == prev {
        break;  // Fixed point reached
    }
    prev = query.clone();
}
```

The fixed-point loop ensures that UNION branches created in pass 1 get their
own OR+sublink patterns rewritten in pass 2.

##### Step 3: Guard Against Combinatorial Explosion

When an AND has N OR+sublink conjuncts, the UNION rewrite produces up to
$2^N$ branches. Add a configurable limit:

```rust
const MAX_UNION_BRANCHES: usize = 16;  // 4 OR+sublink conjuncts

if branch_count > MAX_UNION_BRANCHES {
    return Err(PgTrickleError::UnsupportedOperator(format!(
        "Query would produce {} UNION branches after OR+sublink rewriting \
         (limit: {}). Simplify the WHERE clause or use FULL refresh mode.",
        branch_count, MAX_UNION_BRANCHES,
    )));
}
```

This protects against queries like:
```sql
WHERE (a OR EXISTS(s1)) AND (b OR EXISTS(s2)) AND (c OR EXISTS(s3))
      AND (d OR EXISTS(s4)) AND (e OR EXISTS(s5))
-- Would produce 2^5 = 32 UNION branches
```

##### Step 4: Tests

| Test | Description |
|------|-------------|
| Unit: `test_normalize_multiple_or_sublink_conjuncts` | Two OR+sublink arms in AND |
| Unit: `test_normalize_not_demorgan` | NOT (a AND NOT EXISTS(...)) |
| Unit: `test_normalize_mixed_and_or_under_or` | `a OR (b AND EXISTS(...))` |
| Unit: `test_normalize_triple_or_sublink` | Three OR+sublink arms (8 branches) |
| Unit: `test_normalize_explosion_guard` | 5+ OR+sublink arms → error |
| Unit: `test_normalize_fixed_point` | Verify loop terminates |
| E2E: `test_nested_or_two_exists` | Two OR arms with EXISTS, differential |
| E2E: `test_nested_or_demorgan` | NOT(AND(NOT EXISTS)) pattern |
| E2E: `test_nested_or_mixed` | `a OR (b AND EXISTS(...))` |
| E2E: `test_nested_or_cdc_cycle` | Insert/delete with nested OR pattern |

##### Effort Estimate

- Multi-pass loop: ~15 lines
- Multiple OR+sublink conjunct handling: ~50 lines
- De Morgan normalization: ~80 lines
- Combinatorial guard: ~15 lines
- Tests: ~250 lines
- Risk: Low–Medium — each transform is well-defined; the main risk is
  SQL round-trip fidelity through deparse → rewrite → reparse cycles.

---

## 4. Implementation Order

Recommended order based on user impact and dependency:

| Phase | Gap | Rationale |
|-------|-----|-----------|
| **Phase 1** | G1: User-defined aggregates | Highest real-world impact (PostGIS, analytics). Lowest risk — reuses proven group-rescan. |
| **Phase 2** | G2: Window-in-expression | Common in ranking/pagination queries. Medium risk — requires careful expression extraction. |
| **Phase 3** | G3: Nested OR+sublinks | Lowest frequency. Some transforms (multi-pass) are simple; De Morgan is more involved. |

Each phase is independently shippable and testable.

---

## 5. Verification Criteria

After all three gaps are closed, the following should hold:

1. **No silent FULL fallback for supported patterns**: Every pattern listed in
   the "Status → ✓" column in each gap section should produce DIFFERENTIAL
   mode when `AUTO` is requested.

2. **`pgtrickle.describe_stream_table()`** reports `refresh_mode =
   'DIFFERENTIAL'` for queries that previously fell back to FULL.

3. **Zero regressions**: All existing E2E tests pass unchanged.

4. **Performance**: Group-rescan UDAs should have comparable performance to
   existing group-rescan aggregates (STRING_AGG, ARRAY_AGG). Window rewrite
   adds one subquery layer but no additional scans.

5. **Error messages**: When a UDA doesn't exist at creation time, or when
   OR+sublink expansion exceeds the branch limit, clear error messages guide
   the user.

---

## 6. Open Questions

1. **UDA FILTER/ORDER BY preservation**: The existing group-rescan SQL
   rendering for built-in aggregates preserves FILTER and ORDER BY clauses.
   Need to verify the rendering code path is generic enough for arbitrary
   function names, or whether it hard-codes built-in names.

2. **Window rewrite + DISTINCT**: If the original query has `SELECT DISTINCT`,
   should DISTINCT move to the outer query only, or should the inner subquery
   also deduplicate? (Answer: outer only — the inner subquery must preserve
   all rows for correct window computation.)

3. **De Morgan depth limit**: How deep should NOT push-down recurse? Suggest
   depth limit of 3, matching the UNION rewrite pass limit.

4. **UDA in HAVING clause**: `HAVING my_agg(x) > 10` — the group-rescan
   strategy recomputes the full GROUP BY expression. Verify that HAVING
   conditions are included in the recomputation query.

5. **UDA with DISTINCT inside**: `my_agg(DISTINCT x)` — verify the DISTINCT
   keyword is preserved in the rendered group-rescan SQL.