# Plan: Native PostgreSQL Syntax for Stream Tables Date: 2026-02-25 Status: PROPOSED Last Updated: 2026-02-25 --- ## Table of Contents - [Executive Summary](#executive-summary) - [Motivation](#motivation) - [Constraints](#constraints) - [Approach Evaluation](#approach-evaluation) - [Approach A — Parser Fork (Rejected)](#approach-a--parser-fork-rejected) - [Approach B — Table Access Method (Rejected)](#approach-b--table-access-method-rejected) - [Approach C — Foreign Data Wrapper (Rejected)](#approach-c--foreign-data-wrapper-rejected) - [Approach D — ProcessUtility_hook + CREATE MATERIALIZED VIEW (Recommended)](#approach-d--processutility_hook--create-materialized-view-recommended) - [Approach E — CALL Procedure Syntax (Complementary)](#approach-e--call-procedure-syntax-complementary) - [Recommended Strategy: Tiered Syntax](#recommended-strategy-tiered-syntax) - [Detailed Design — Tier 2: ProcessUtility_hook](#detailed-design--tier-2-processutility_hook) - [User-Facing Syntax](#user-facing-syntax) - [Hook Registration](#hook-registration) - [CREATE Interception](#create-interception) - [DROP Interception](#drop-interception) - [REFRESH Interception](#refresh-interception) - [ALTER Interception](#alter-interception) - [psql Tab-Completion Compatibility](#psql-tab-completion-compatibility) - [pg_dump / pg_restore Strategy](#pg_dump--pg_restore-strategy) - [Error Handling](#error-handling) - [Detailed Design — Tier 1.5: CALL Procedure Syntax](#detailed-design--tier-15-call-procedure-syntax) - [Implementation Phases](#implementation-phases) - [Module Layout Changes](#module-layout-changes) - [ADR: Native Syntax Approach](#adr-native-syntax-approach) - [Testing Strategy](#testing-strategy) - [Risks and Mitigations](#risks-and-mitigations) - [Comparison with Prior Art](#comparison-with-prior-art) - [References](#references) --- ## Executive Summary This plan proposes a **tiered syntax strategy** for creating and managing stream tables in pg_trickle, evolving from the current function-call API toward a more native-feeling DDL experience without forking PostgreSQL. | Tier | Syntax | Status | Requires | |------|--------|--------|----------| | **Tier 1** | `SELECT pgtrickle.create_stream_table(...)` | Existing | Nothing extra | | **Tier 1.5** | `CALL pgtrickle.create_stream_table(...)` | New (trivial) | PG 11+ | | **Tier 2** | `CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream = true) AS ...` | New | `shared_preload_libraries` | All tiers produce identical results — a stream table registered in `pgtrickle.pgt_stream_tables` with CDC, DAG, and scheduling fully configured. Tier 1 remains the stable, always-available interface. Tier 2 provides native-feeling DDL syntax for users who prefer standard SQL idioms. **Estimated effort:** ~1,200–1,800 lines of Rust (hook + tests) for Tier 2. --- ## Motivation ### Problem The current API requires function-call syntax: ```sql SELECT pgtrickle.create_stream_table( 'order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m', 'DIFFERENTIAL' ); ``` While functional, this has several UX shortcomings: 1. **Unfamiliar idiom** — PostgreSQL users expect DDL operations (`CREATE`, `DROP`, `ALTER`) for persistent database objects. `SELECT function()` reads as a query, not a definition. 2. **Query-as-string** — The defining query is embedded as a text literal, losing IDE support (syntax highlighting, auto-complete, error detection). 3. **No psql `\d` integration** — Stream tables appear as plain tables in `\dt`, with no indication they are maintained views. 4. **pg_dump gap** — Function calls are not captured by `pg_dump`. Stream table definitions survive only via the catalog tables, requiring custom backup/restore tooling. 5. **Discoverability** — New users searching for "create materialized view postgres" won't find pg_trickle's function-based approach. ### Goal Provide a syntax that: - Feels like native PostgreSQL DDL - Preserves the query as SQL (not a string literal) - Works with existing PostgreSQL tooling where possible - Does NOT require forking PostgreSQL - Maintains full backward compatibility with the function API --- ## Constraints ### Hard Constraints 1. **PostgreSQL's parser is not extensible.** There is no parser hook. The raw parser (`gram.y`) is compiled into the server binary. Extensions cannot add new keywords (`STREAM`) or grammar productions (`CREATE STREAM TABLE`). This rules out true `CREATE STREAM TABLE` syntax. 2. **`shared_preload_libraries` is already required** for the background scheduler worker and shared memory. The `ProcessUtility_hook` falls within the same requirement — no additional deployment burden. 3. **Hook chaining** — Other extensions (TimescaleDB, Citus, pg_stat_statements) may also set `ProcessUtility_hook`. Our hook MUST chain to the previous hook to avoid breaking other extensions. 4. **Backward compatibility** — The existing function API (`pgtrickle.create_stream_table()`) must continue to work unchanged. The new syntax is an additional path, not a replacement. ### Soft Constraints 5. **pg_dump compatibility** — Desirable but not expected to be perfect. A companion `pgtrickle.dump_stream_tables()` function is acceptable. 6. **Minimal surface area** — The hook should intercept only the DDL commands it needs and pass everything else through without modification. --- ## Approach Evaluation ### Approach A — Parser Fork (Rejected) **Concept:** Fork PostgreSQL, add `CREATE STREAM TABLE` as new grammar in `gram.y`, ship a modified PostgreSQL binary. **Why rejected:** - Must maintain a full PostgreSQL fork, rebasing on every major release - Cannot use `CREATE EXTENSION` — must replace the user's entire PostgreSQL - `pg_dump` / `pg_restore` / `psql` all need modifications - User adoption barrier is extreme (replace your database engine) - Maintenance cost: `gram.y` changes significantly between PG majors **Viability:** Only for a PostgreSQL distribution product (like YugabyteDB, Greenplum). Not viable for a loadable extension. ### Approach B — Table Access Method (Rejected) **Concept:** Register a custom table AM (`stream_heap`), intercept `CREATE TABLE ... USING stream_heap WITH (query = '...', schedule = '1m')`. **Why rejected:** - Table AM requires implementing 60+ callbacks (scan, insert, delete, vacuum, etc.) - `CREATE TABLE ... USING` requires explicit column definitions — stream tables derive columns from the query, creating redundancy - No `AS SELECT` support with `USING` clause in PG grammar - Storing the defining query in `WITH` reloptions is hacky and has length limits - pg_dump would dump `CREATE TABLE ... USING stream_heap` but not the associated metadata (query, schedule) - Extreme implementation complexity for marginal UX improvement ### Approach C — Foreign Data Wrapper (Rejected) **Concept:** Register a custom FDW, use `CREATE FOREIGN TABLE ... SERVER pgtrickle_server OPTIONS (query '...', schedule '1m')`. **Why rejected:** - Foreign tables cannot have indexes — stream tables need `__pgt_row_id` unique index - Foreign tables cannot have triggers — breaks user-trigger support - No MVCC snapshot isolation guarantees - "Foreign table" implies external data, confusing the mental model - `EXPLAIN` shows "Foreign Scan" instead of "Seq Scan" ### Approach D — ProcessUtility_hook + CREATE MATERIALIZED VIEW (Recommended) **Concept:** Intercept `CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream = true) AS SELECT ...` via `ProcessUtility_hook`. When the custom option is detected, route to `create_stream_table_impl()` instead of standard matview creation. **Why recommended:** - **Proven pattern** — TimescaleDB uses this exact approach for continuous aggregates (`WITH (timescaledb.continuous)`), one of the most widely deployed PostgreSQL extensions - **Native feel** — `CREATE MATERIALIZED VIEW` is the closest standard DDL to what a stream table is conceptually (a materialized derived dataset) - **Query is SQL** — The `AS SELECT ...` clause preserves the query as native SQL, enabling IDE syntax highlighting and auto-complete - **`shared_preload_libraries` already required** — No additional deployment burden - **Column derivation** — `CREATE MATERIALIZED VIEW ... AS SELECT` naturally derives column types from the query, exactly like stream tables - **Low risk** — The hook only fires for matviews with our specific option; all other DDL passes through untouched - **`DROP` / `REFRESH` integration** — Can intercept `DROP MATERIALIZED VIEW` and `REFRESH MATERIALIZED VIEW` for matching stream tables **Trade-offs:** - ~1,200–1,800 lines of hook code - Must track `ProcessUtility_hook` signature changes across PG versions (changed in PG14, PG15; stable since) - pg_dump will dump `CREATE MATERIALIZED VIEW` DDL but our hook must be active during restore for it to take effect - Users may be confused that stream tables appear as "materialized views" in some tools ### Approach E — CALL Procedure Syntax (Complementary) **Concept:** Expose `create_stream_table` as a PostgreSQL procedure (not just a function), enabling `CALL pgtrickle.create_stream_table(...)`. **Why complementary:** - Trivial to implement (add `#[pg_extern]` with procedure semantics or create a wrapper procedure) - `CALL` reads as a command/action rather than a query - No `shared_preload_libraries` requirement - Does not solve the query-as-string problem - Quick win that can ship independently --- ## Recommended Strategy: Tiered Syntax ### Tier 1: Function API (Existing — No Changes) ```sql -- Always available, no special requirements SELECT pgtrickle.create_stream_table( 'order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m', 'DIFFERENTIAL' ); SELECT pgtrickle.drop_stream_table('order_totals'); SELECT pgtrickle.refresh_stream_table('order_totals'); SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m'); ``` ### Tier 1.5: CALL Procedure Syntax (New — Trivial) ```sql -- Same API but reads as a command, not a query CALL pgtrickle.create_stream_table( 'order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m', 'DIFFERENTIAL' ); CALL pgtrickle.drop_stream_table('order_totals'); CALL pgtrickle.refresh_stream_table('order_totals'); ``` ### Tier 2: Native DDL Syntax (New — ProcessUtility_hook) ```sql -- Create a stream table using familiar matview DDL CREATE MATERIALIZED VIEW order_totals WITH ( pgtrickle.stream = true, pgtrickle.schedule = '1m', pgtrickle.mode = 'DIFFERENTIAL' ) AS SELECT region, SUM(amount) FROM orders GROUP BY region WITH NO DATA; -- Or initialize immediately (WITH DATA is the default): CREATE MATERIALIZED VIEW order_totals WITH (pgtrickle.stream = true) AS SELECT region, SUM(amount) FROM orders GROUP BY region; -- Refresh REFRESH MATERIALIZED VIEW order_totals; -- Drop DROP MATERIALIZED VIEW order_totals; DROP MATERIALIZED VIEW IF EXISTS order_totals CASCADE; -- Alter (still a function call — no standard ALTER MATERIALIZED VIEW -- for custom options): SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m'); ``` --- ## Detailed Design — Tier 2: ProcessUtility_hook ### User-Facing Syntax #### CREATE ```sql CREATE MATERIALIZED VIEW [IF NOT EXISTS] [schema.]name WITH ( pgtrickle.stream = true, -- required: marks this as a stream table pgtrickle.schedule = '1m', -- optional: default '1m' pgtrickle.mode = 'DIFFERENTIAL' -- optional: default 'DIFFERENTIAL' ) AS select_query [WITH DATA | WITH NO DATA]; ``` **Option semantics:** | Option | Type | Default | Description | |--------|------|---------|-------------| | `pgtrickle.stream` | `bool` | — | **Required.** Must be `true` to trigger stream table creation. | | `pgtrickle.schedule` | `text` | `'1m'` | Duration string or cron expression. Set to `'CALCULATED'` for NULL/downstream schedule. | | `pgtrickle.mode` | `text` | `'DIFFERENTIAL'` | `'FULL'` or `'DIFFERENTIAL'`. | - `WITH DATA` (default) → calls `create_stream_table_impl()` with `initialize = true` - `WITH NO DATA` → calls `create_stream_table_impl()` with `initialize = false` #### DROP ```sql DROP MATERIALIZED VIEW [IF EXISTS] [schema.]name [CASCADE | RESTRICT]; ``` The hook checks whether the target is a registered stream table (by OID lookup in `pgtrickle.pgt_stream_tables`). If so, it routes to `drop_stream_table_impl()` instead of standard matview drop. If not, it passes through to the standard handler. **CASCADE behavior:** If a stream table depends on another stream table being dropped, `CASCADE` propagates via pg_trickle's dependency tracking (same as `drop_stream_table_impl()` already handles). #### REFRESH ```sql REFRESH MATERIALIZED VIEW [CONCURRENTLY] [schema.]name; ``` The hook checks whether the target is a stream table. If so, routes to `refresh_stream_table_impl()`. The `CONCURRENTLY` keyword is ignored (stream table refreshes are always concurrent by design — readers see the previous version until the refresh completes). #### ALTER There is no standard `ALTER MATERIALIZED VIEW ... SET (option = value)` syntax that we can meaningfully intercept for custom options. Alter operations remain as function calls: ```sql SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m'); SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED'); ``` Future PostgreSQL versions may add `ALTER MATERIALIZED VIEW ... SET (reloption)` support. If so, we can intercept that. ### Hook Registration ```rust // src/hooks.rs — extended use std::ffi::c_char; static mut PREV_PROCESS_UTILITY_HOOK: pg_sys::ProcessUtility_hook_type = None; /// Register ProcessUtility_hook for native DDL syntax support. /// /// Called from _PG_init() when loaded via shared_preload_libraries. /// Chains with any previously installed hook (e.g., pg_stat_statements, /// TimescaleDB). pub fn register_process_utility_hook() { unsafe { PREV_PROCESS_UTILITY_HOOK = pg_sys::ProcessUtility_hook; pg_sys::ProcessUtility_hook = Some(pg_trickle_process_utility); } } ``` In `_PG_init()` (src/lib.rs): ```rust if in_shared_preload { shmem::init_shared_memory(); scheduler::register_scheduler_worker(); // NEW: Register ProcessUtility_hook for native DDL syntax hooks::register_process_utility_hook(); log!("pg_trickle: initialized (shared_preload_libraries)"); } ``` ### CREATE Interception When `ProcessUtility_hook` fires with a `CreateTableAsStmt` node where `objtype == OBJECT_MATVIEW`: ``` pg_trickle_process_utility() ├── Is it CreateTableAsStmt with OBJECT_MATVIEW? │ ├── No → pass through to prev hook / standard_ProcessUtility │ └── Yes → extract reloptions from IntoClause │ ├── Has pgtrickle.stream = true? │ │ ├── No → pass through (normal matview) │ │ └── Yes → handle_create_stream_table_ddl() │ │ ├── Extract query string from the parse tree │ │ ├── Extract pgtrickle.schedule, pgtrickle.mode │ │ ├── Determine schema.name from IntoClause->rel │ │ ├── Determine WITH DATA / WITH NO DATA │ │ ├── Call create_stream_table_impl() │ │ └── Set QueryCompletion tag │ └── Has only non-pgtrickle options? │ └── Pass through (normal matview with custom storage params) └── Is it something else? └── Check other interceptions (DROP, REFRESH) ``` **Key implementation detail:** The query text for the defining query must be extracted from the `CreateTableAsStmt->query` node. Since this is an analyzed `Query` node (not raw SQL text), we need to either: 1. **Deparse from the parse tree** — Use `pg_sys::nodeToString()` or pgrx deparse utilities to reconstruct the SQL from the `SelectStmt` node. 2. **Extract from the raw query string** — The `queryString` parameter to the hook contains the full original SQL. Parse it to extract the `AS SELECT ...` portion. 3. **Use `pg_get_viewdef`-style deparsing** — Not applicable since the matview doesn't exist yet. Approach (2) is simpler and preserves the user's original formatting.The `AS` keyword position can be found by walking the query string past the `WITH (...)` clause. **Alternatively**, and more robustly: use PostgreSQL's `deparse_query` on the `SelectStmt` subtree to get a canonical SQL string. This avoids fragile string parsing and handles edge cases (comments, dollar-quoting, etc.). ### DROP Interception When `ProcessUtility_hook` fires with a `DropStmt` node where `removeType == OBJECT_MATVIEW`: ``` pg_trickle_process_utility() ├── Is it DropStmt with OBJECT_MATVIEW? │ └── Yes → for each object in the drop list: │ ├── Resolve name to OID │ ├── Is OID in pgtrickle.pgt_stream_tables? │ │ ├── No → pass through (normal matview drop) │ │ └── Yes → handle_drop_stream_table_ddl() │ │ ├── Call drop_stream_table_impl() │ │ └── Set QueryCompletion tag │ └── Mixed list? (some STs, some normal matviews) │ └── Split: handle STs ourselves, pass rest through └── Not a matview drop → pass through ``` **IF EXISTS handling:** If the name doesn't resolve to an OID and `IF EXISTS` is set in the `DropStmt`, emit a `NOTICE` and continue (matching standard PG behavior). ### REFRESH Interception When `ProcessUtility_hook` fires with a `RefreshMatViewStmt` node: ``` pg_trickle_process_utility() ├── Is it RefreshMatViewStmt? │ └── Yes → resolve relation name to OID │ ├── Is OID in pgtrickle.pgt_stream_tables? │ │ ├── No → pass through (normal matview refresh) │ │ └── Yes → handle_refresh_stream_table_ddl() │ │ ├── Call refresh_stream_table_impl() │ │ └── Set QueryCompletion tag │ └── OID not found → pass through (let standard handler error) └── Not a refresh → pass through ``` **CONCURRENTLY:** The `RefreshMatViewStmt` has a `concurrent` field. For stream tables, we log a `DEBUG1` message that concurrency is implicit and proceed with the normal refresh. ### ALTER Interception `ALTER MATERIALIZED VIEW name SET (option = value)` is handled as an `AlterTableStmt` with `objtype == OBJECT_MATVIEW` and subcommand `AT_SetRelOptions`. We could intercept this to support: ```sql ALTER MATERIALIZED VIEW order_totals SET (pgtrickle.schedule = '5m'); ``` However, this interception is complex (must parse subcommands, handle mixed option sets) and provides marginal UX benefit since `ALTER` is less frequent than `CREATE`. **Deferred to a future iteration.** ### psql Tab-Completion Compatibility Since stream tables are created via `CREATE MATERIALIZED VIEW` syntax, they naturally appear in psql's `\dm` (list materialized views) output. However, since we intercept creation and create a regular table (not an actual matview), they will appear in `\dt` (regular tables) instead. **Mitigation options:** 1. **Actually create a matview** — After our custom processing, also create a thin matview entry in `pg_class` with `relkind = 'm'`. This is fragile and conflicts with the storage table. 2. **Create a view alias** — Create a regular view with the same name that reads from the storage table. The storage table gets an internal name like `_pgtrickle_store_`. 3. **Accept the `\dt` listing** — Stream tables appear as regular tables in `\dt`, which is technically accurate (they ARE regular heap tables). The `pgtrickle.stream_tables_info` view provides the authoritative listing. 4. **Register a custom psql command** — Not possible without modifying psql. **Recommendation:** Option 3 (accept `\dt` listing) for the initial implementation. The storage table IS a regular table, and `\dt` is correct. Add `COMMENT ON TABLE` with a descriptive marker so users can identify stream tables: ```sql COMMENT ON TABLE schema.name IS 'pgtrickle: stream table (schedule=1m, mode=DIFFERENTIAL)'; ``` ### pg_dump / pg_restore Strategy This is the most significant challenge with the hook-based approach. #### Problem `pg_dump` dumps objects based on their `relkind` in `pg_class`. Since stream tables are stored as regular tables (`relkind = 'r'`), `pg_dump` will emit `CREATE TABLE` DDL with the full column list — not the `CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream = true) AS SELECT ...` that our hook expects. The defining query, schedule, and other metadata live in `pgtrickle.pgt_stream_tables`, which `pg_dump` dumps as `INSERT` statements (since it's an extension catalog table in `extension_sql!`). #### Strategy: Companion Dump/Restore Functions Provide explicit dump and restore functions: ```sql -- Generate SQL to recreate all stream tables SELECT pgtrickle.generate_dump(); -- Output: -- SELECT pgtrickle.create_stream_table('order_totals', -- 'SELECT region, SUM(amount) FROM orders GROUP BY region', -- '1m', 'DIFFERENTIAL', false); -- SELECT pgtrickle.create_stream_table('daily_stats', ...); ``` ```sql -- Restore: run after pg_restore has created the extension and base tables -- The function reads pgtrickle.pgt_stream_tables (restored by pg_dump) -- and recreates the storage tables + CDC + triggers SELECT pgtrickle.restore_stream_tables(); ``` #### Strategy: Event Trigger on Extension Load Register an event trigger that fires on `CREATE EXTENSION pg_trickle` and checks if the catalog tables contain orphaned entries (tables exist in catalog but storage tables are missing). If so, automatically recreate them. #### pg_dump --section Behavior | pg_dump section | What it dumps | pg_trickle impact | |----------------|---------------|------------------| | `pre-data` | Schemas, extensions, types | `CREATE EXTENSION pg_trickle` → creates catalog tables | | `data` | Table contents | Catalog table rows (INSERT INTO pgtrickle.pgt_stream_tables) + storage table data | | `post-data` | Indexes, triggers, constraints | Storage table indexes, CDC triggers (but these are extension-managed) | **Recommended restore workflow:** ```bash # 1. Restore schema + extension pg_restore --section=pre-data dump.sql # 2. Restore data (includes catalog entries + storage table data) pg_restore --section=data dump.sql # 3. Rebuild stream tables from catalog (recreates CDC, triggers, indexes) psql -c "SELECT pgtrickle.restore_stream_tables();" # 4. Restore remaining post-data pg_restore --section=post-data dump.sql ``` ### Error Handling Hook errors must be handled carefully to avoid crashing the backend: 1. **Never `panic!()` in the hook** — Use `pg_guard` + `Result<>` patterns 2. **Failed interception → pass through** — If our hook encounters an error during stream table detection (e.g., SPI failure querying catalog), log a warning and pass through to the standard handler 3. **Stream table creation errors** — Report via `ereport(ERROR, ...)` as usual. The transaction rolls back, leaving no artifacts. 4. **Unknown options** — If `pgtrickle.stream = true` is present but other `pgtrickle.*` options are unrecognized, raise an error listing valid options --- ## Detailed Design — Tier 1.5: CALL Procedure Syntax ### Implementation PostgreSQL 11+ supports `CALL procedure(...)` for procedures. Currently pg_trickle exposes functions (callable via `SELECT`). To support `CALL` syntax: **Option A: Wrapper Procedures (SQL layer)** ```sql CREATE PROCEDURE pgtrickle.create_stream_table( name text, query text, schedule text DEFAULT '1m', refresh_mode text DEFAULT 'DIFFERENTIAL', initialize bool DEFAULT true ) LANGUAGE sql AS $$ SELECT pgtrickle.create_stream_table(name, query, schedule, refresh_mode, initialize); $$; ``` This approach is straightforward but creates naming conflicts (function and procedure with same name and signature). PostgreSQL resolves this based on context (`SELECT` calls the function, `CALL` calls the procedure), but some tools may be confused. **Option B: Distinct Procedure Names** ```sql CALL pgtrickle.stream_table_create('order_totals', 'SELECT ...', '1m'); CALL pgtrickle.stream_table_drop('order_totals'); CALL pgtrickle.stream_table_refresh('order_totals'); ``` **Option C: pgrx `#[pg_extern]` with Procedure Support** pgrx 0.17.x does not yet natively generate `CREATE PROCEDURE`. The wrapper approach (Option A) via `extension_sql!()` is the most practical. **Recommendation:** Option A — wrapper procedures with the same name. Add via `extension_sql!()` in `lib.rs`. This preserves the existing function API and adds `CALL` as an alias. ### Effort ~30 lines of SQL in `extension_sql!()`. Minimal Rust changes. --- ## Implementation Phases ### Phase 1: Tier 1.5 — CALL Syntax (1–2 days) **Scope:** Add `CREATE PROCEDURE` wrappers for `create_stream_table`, `drop_stream_table`, `refresh_stream_table`, and `alter_stream_table`. **Files changed:** - `src/lib.rs` — New `extension_sql!()` block with procedure definitions **Tests:** - E2E test: create/refresh/drop via `CALL` syntax - Verify `SELECT` and `CALL` both work for same operations ### Phase 2: Hook Infrastructure (3–5 days) **Scope:** Register `ProcessUtility_hook` in `_PG_init()`, implement the dispatch logic, and handle passthrough for non-pgtrickle DDL. **Files changed:** - `src/hooks.rs` — Add `register_process_utility_hook()`, `pg_trickle_process_utility()` dispatch function, hook chaining - `src/lib.rs` — Call `hooks::register_process_utility_hook()` in `_PG_init()` **Tests:** - Verify hook registration doesn't break existing DDL - Verify non-pgtrickle matviews still work - Verify other extension hooks still chain correctly ### Phase 3: CREATE Interception (5–7 days) **Scope:** Intercept `CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream = true)` and route to `create_stream_table_impl()`. **Key tasks:** 1. Parse `CreateTableAsStmt` node to extract reloptions 2. Detect `pgtrickle.stream = true` in the option list 3. Extract schema, name, query text from the parse tree 4. Extract `pgtrickle.schedule` and `pgtrickle.mode` options 5. Determine `WITH DATA` / `WITH NO DATA` 6. Call `create_stream_table_impl()` with extracted parameters 7. Set `QueryCompletion` tag to `"CREATE MATERIALIZED VIEW"` 8. Add `COMMENT ON TABLE` with stream table marker **Files changed:** - `src/hooks.rs` — `handle_create_stream_table_ddl()` - `src/api.rs` — Ensure `create_stream_table_impl()` is `pub(crate)` **Tests:** - E2E: Create stream table via matview syntax - E2E: Verify options parsing (schedule, mode, with data/no data) - E2E: Verify normal matviews are unaffected - E2E: Verify error handling (invalid options, bad queries) - E2E: Verify the resulting table matches function-API-created tables ### Phase 4: DROP + REFRESH Interception (3–4 days) **Scope:** Intercept `DROP MATERIALIZED VIEW` and `REFRESH MATERIALIZED VIEW` for registered stream tables. **Files changed:** - `src/hooks.rs` — `handle_drop_stream_table_ddl()`, `handle_refresh_stream_table_ddl()` **Tests:** - E2E: Drop stream table via `DROP MATERIALIZED VIEW` - E2E: Refresh via `REFRESH MATERIALIZED VIEW` - E2E: `IF EXISTS` handling - E2E: `CASCADE` handling - E2E: Verify normal matviews are unaffected ### Phase 5: pg_dump / Restore Support (InProgress) **Status:** In Progress. `generate_dump()` and `restore_stream_tables()` are fully implemented. Next is the event-trigger behavior. **Scope:** Implement `pgtrickle.generate_dump()` and `pgtrickle.restore_stream_tables()` functions. **Files changed:** - `src/api.rs` — New functions `generate_dump()`, `restore_stream_tables()` **Tests:** - Integration: Round-trip dump → restore → verify stream tables work - [x] E2E: Verify restore from pg_dump output ### Phase 6: Documentation + Polish (2–3 days) **Scope:** Update SQL_REFERENCE.md, GETTING_STARTED.md, FAQ.md. Add migration guide for existing users. **Files changed:** - `docs/SQL_REFERENCE.md` — Document Tier 2 syntax - `docs/GETTING_STARTED.md` — Show native syntax in examples - `docs/FAQ.md` — Add "Why CREATE MATERIALIZED VIEW?" Q&A - `CHANGELOG.md` — Feature announcement **Total estimated effort: 17–25 days** --- ## Module Layout Changes ``` src/ ├── hooks.rs # Extended: ProcessUtility_hook registration + dispatch │ ├── (existing) # DDL event triggers (_on_ddl_end, _on_sql_drop) │ ├── NEW # register_process_utility_hook() │ ├── NEW # pg_trickle_process_utility() — main dispatch │ ├── NEW # handle_create_stream_table_ddl() │ ├── NEW # handle_drop_stream_table_ddl() │ ├── NEW # handle_refresh_stream_table_ddl() │ └── NEW # extract_pgtrickle_options() — reloption parser ├── api.rs # Unchanged (impl functions already exist) ├── lib.rs # Extended: hook registration in _PG_init(), CALL procs └── (all other files unchanged) ``` --- ## ADR: Native Syntax Approach ### ADR-012: Native DDL Syntax via ProcessUtility_hook | Field | Value | |-------|-------| | **Status** | Proposed | | **Date** | 2026-02-25 | | **Deciders** | pg_trickle core team | | **Category** | API & Schema Design | #### Context Users expect DDL syntax (`CREATE`, `DROP`, `ALTER`) for persistent database objects. The function-call API works but feels foreign to PostgreSQL users accustomed to standard DDL. Additionally, embedding the defining query as a string literal loses IDE support. #### Decision Implement a tiered syntax strategy: - **Tier 1** (existing): Function API — always available, no special requirements - **Tier 1.5** (new): `CALL` procedure wrappers — trivial addition - **Tier 2** (new): `CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream = true)` via `ProcessUtility_hook` — native-feeling DDL when `shared_preload_libraries` is configured #### Options Considered | Option | Verdict | Reason | |--------|---------|--------| | Parser fork | Rejected | Requires PostgreSQL fork, not an extension | | Table AM | Rejected | 60+ callbacks, column derivation unsupported | | FDW | Rejected | No indexes, no triggers, wrong semantics | | **ProcessUtility_hook + matview** | **Selected** | Proven by TimescaleDB, native feel | | COMMENT abuse | Rejected | Fragile, poor UX | #### Consequences **Positive:** - Users get native-feeling DDL syntax - Defining query is SQL (not a string literal) - Follows established pattern (TimescaleDB continuous aggregates) - No additional deployment burden (shared_preload_libraries already required) **Negative:** - ~1,200–1,800 lines of new hook code - Must track `ProcessUtility_hook` signature across PG versions - pg_dump requires custom companion functions - Stream tables appear as regular tables in `\dt`, not `\dm` - Potential confusion: users may expect full matview semantics (e.g., `REFRESH ... CONCURRENTLY` with different behavior) --- ## Testing Strategy ### Unit Tests (src/hooks.rs `#[cfg(test)]`) - Option parsing: extract `pgtrickle.stream`, `pgtrickle.schedule`, `pgtrickle.mode` from `DefElem` lists - Schema/name parsing from `RangeVar` - WITH DATA / WITH NO DATA detection ### E2E Tests (tests/e2e_native_syntax_tests.rs) | Test | Description | |------|-------------| | `test_create_via_matview_syntax` | Basic CREATE MATERIALIZED VIEW with pgtrickle.stream | | `test_create_with_all_options` | schedule + mode + WITH NO DATA | | `test_create_default_options` | Only pgtrickle.stream = true, all defaults | | `test_create_schema_qualified` | `CREATE MATERIALIZED VIEW myschema.my_st ...` | | `test_create_if_not_exists` | IF NOT EXISTS handling | | `test_drop_via_matview_syntax` | DROP MATERIALIZED VIEW on a stream table | | `test_drop_if_exists` | DROP MATERIALIZED VIEW IF EXISTS | | `test_drop_cascade` | CASCADE propagation | | `test_refresh_via_matview_syntax` | REFRESH MATERIALIZED VIEW on a stream table | | `test_refresh_concurrently_ignored` | CONCURRENTLY keyword logged but ignored | | `test_normal_matview_passthrough` | Regular matview without pgtrickle.stream works normally | | `test_normal_drop_passthrough` | DROP on non-stream matview works normally | | `test_mixed_create_function_and_ddl` | Create via function, drop via DDL and vice versa | | `test_invalid_pgtrickle_options` | Unknown option → error | | `test_call_syntax` | CALL pgtrickle.create_stream_table(...) | | `test_hook_chaining` | Verify pg_stat_statements still works alongside hook | | `test_dump_restore_roundtrip` | generate_dump() → restore → verify | ### Property Tests - For any stream table created via Tier 1 (function), the resulting catalog entries and storage table should be identical to one created via Tier 2 (DDL) with the same parameters. --- ## Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|------------| | ProcessUtility_hook signature changes in PG 19+ | Medium | Medium | Conditional compilation with `#[cfg(any(...))]` for PG version. Hook signature has been stable since PG 15. | | Other extension hooks conflict | Low | High | Always chain `prev_ProcessUtility_hook`. Test with pg_stat_statements. Document hook ordering requirements. | | pg_dump produces incorrect restore SQL | High | Medium | Provide `pgtrickle.generate_dump()` and `pgtrickle.restore_stream_tables()`. Document recommended backup workflow. | | User confusion: "why is my matview in `\dt` not `\dm`?" | Medium | Low | Documentation, COMMENT ON TABLE marker, FAQ entry. | | Hook code introduces crashes | Low | High | Extensive `#[pg_guard]` usage, never `unwrap()` in hook path, graceful passthrough on errors. | | Deparse of defining query loses fidelity | Medium | Medium | Use raw query string extraction (approach 2) rather than deparsing from parse tree. Preserve user's original SQL. | | Extension load order matters | Low | Medium | Document that pg_trickle should be listed after pg_stat_statements in `shared_preload_libraries` (or before — test both). | --- ## Comparison with Prior Art | Feature | pg_trickle (proposed) | TimescaleDB | pg_ivm | Citus | |---------|---------------------|-------------|--------|-------| | Creation syntax | `CREATE MATVIEW WITH (pgtrickle.stream)` | `CREATE MATVIEW WITH (timescaledb.continuous)` | `SELECT create_immv(...)` | `SELECT create_distributed_table(...)` | | Function API | Yes (primary) | Yes (policies) | Yes (primary) | Yes (primary) | | ProcessUtility_hook | Yes | Yes (extensive) | No | Yes (extensive) | | pg_dump support | Custom functions | Built-in (complex) | No | Built-in | | Storage type | Regular table | Hypertable | Regular table | Distributed table | | REFRESH integration | `REFRESH MATERIALIZED VIEW` | `REFRESH MATERIALIZED VIEW` | `SELECT refresh_immv(...)` | N/A | | DROP integration | `DROP MATERIALIZED VIEW` | `DROP MATERIALIZED VIEW` | `DROP TABLE` | `SELECT undistribute_table(...)` | --- ## References 1. [Custom SQL Syntax Research Report](REPORT_CUSTOM_SQL_SYNTAX.md) — Comprehensive analysis of all PostgreSQL extension syntax mechanisms 2. TimescaleDB `process_utility.c` — [github.com/timescale/timescaledb](https://github.com/timescale/timescaledb/blob/main/src/process_utility.c) 3. Citus `multi_utility.c` — [github.com/citusdata/citus](https://github.com/citusdata/citus/blob/main/src/backend/distributed/commands/multi_utility.c) 4. pg_ivm `createas.c` — [github.com/sraoss/pg_ivm](https://github.com/sraoss/pg_ivm/blob/main/createas.c) 5. PostgreSQL `ProcessUtility_hook` — `src/backend/tcop/utility.c` in PostgreSQL source 6. PostgreSQL Table AM API — [postgresql.org/docs/18/tableam.html](https://www.postgresql.org/docs/18/tableam.html) 7. pgrx Hooks — pgrx documentation on `ProcessUtility_hook` in Rust 8. [ADR Records](../adrs/PLAN_ADRS.md) — ADR-001 (Triggers), ADR-003 (DVM Engine)