# Engine Composability Analysis — Extractable Components & Internal Decomposition **Date:** 2026-03-03 **Status:** Analysis / Proposal **Type:** REPORT --- ## Executive Summary pg_trickle is a ~48K-line Rust codebase compiled as a single PostgreSQL extension (`cdylib`). Analysis of the architecture, source coupling, and planned features reveals **five components that could be extracted as standalone projects** and **three internal decomposition opportunities** that would make the extension itself more composable. The existing [PLAN_ECO_SYSTEM.md](ecosystem/PLAN_ECO_SYSTEM.md) already covers *integrations* (dbt, Airflow, Grafana, CLI); this analysis focuses on **core engine decomposition** — breaking the monolith into reusable building blocks. ### Why This Matters 1. **Managed PG services** (RDS, Cloud SQL, Neon, Supabase) cannot install C extensions. An external sidecar needs the DVM engine *without* pgrx. 2. **Broader adoption** — a standalone SQL-differencing library is useful far beyond pg_trickle (migration tools, query planners, testing frameworks). 3. **Independent release cadences** — the DAG engine and DVM engine change at different rates than the CDC layer. 4. **Testing & correctness** — pure-Rust crates can be fuzzed and property-tested without a PostgreSQL backend. --- ## Part 1: Extractable Components (Separate Crates / Projects) ### 1.1. `pg-query-diff` — SQL Delta Query Generator (DVM Engine) **What:** The DVM engine (`src/dvm/`) minus the PostgreSQL parser coupling. A library crate that takes an operator tree (OpTree) and produces delta SQL. **Current size:** ~25K lines across `dvm/mod.rs`, `dvm/diff.rs`, `dvm/row_id.rs`, `dvm/parser.rs` (15K lines), and `dvm/operators/` (21 operator files). **Coupling analysis:** | Submodule | pgrx/pg_sys refs | Status | |-----------|-----------------|--------| | `dvm/operators/*.rs` (20 files) | 2 (only in `recursive_cte.rs`: 1 `pgrx::info!`, 1 `Spi`) | **Nearly pure Rust** | | `dvm/diff.rs` | 0 | **Pure Rust** | | `dvm/row_id.rs` | 0 | **Pure Rust** | | `dvm/mod.rs` | 4 (cache management, SPI for volatility check) | Low coupling | | `dvm/parser.rs` | ~15 `Spi::connect` + extensive `pg_sys::raw_parser` usage | **Deeply coupled** | **Extraction strategy:** The operators + diff engine are already nearly decoupled. The blocker is `parser.rs` (15K lines), which uses `pg_sys::raw_parser()` to walk PG's internal C parse tree. Two-phase approach: 1. **Phase 1 — Define a `ParseFrontend` trait** that abstracts the parsing interface. The current `pg_sys::raw_parser` implementation becomes one backend; a `pg_query.rs` (libpg_query) implementation becomes another. This is already identified in [REPORT_EXTERNAL_PROCESS.md](infra/REPORT_EXTERNAL_PROCESS.md) §3.1 Strategy A. 2. **Phase 2 — Extract `pg-query-diff` crate** containing: - `OpTree` and all operator types - `DiffContext` and delta SQL generation - `RowIdStrategy` and row ID generation - `CteRegistry` and CTE handling - The `ParseFrontend` trait (but not the pgrx implementation) - All auto-rewrite passes (they operate on SQL strings, not parse trees) **Value as a standalone project:** - **Migration tools** could use it to generate incremental SQL from view definitions - **Testing frameworks** could diff query outputs - **Other IVM systems** (non-PG) could reuse the operator algebra - **The sidecar architecture** (REPORT_EXTERNAL_PROCESS.md) becomes viable without forking the entire extension **Effort:** ~40–60 hours (trait abstraction + crate extraction + CI) --- ### 1.2. `pg-dag` — Dependency Graph Engine **What:** The DAG module (`src/dag.rs`, ~1960 lines) as a standalone crate for managing dependency graphs with topological ordering, cycle detection, diamond detection, consistency groups, and schedule propagation. **Coupling analysis:** | Concern | pgrx dependency | Extractable? | |---------|----------------|--------------| | Graph algorithms (topological sort, cycle detection, diamond detection) | **None** — pure `HashMap`/`HashSet` | ✅ Immediately | | Consistency group computation | **None** | ✅ Immediately | | Schedule propagation + canonical periods | **None** | ✅ Immediately | | `StDag::build_from_catalog()` | `Spi::connect()` to read `pgt_stream_tables` + `pgt_dependencies` | Stays in extension | The `build_from_catalog()` function (~80 lines) is the only SPI-coupled code. Everything else is pure Rust graph logic that takes a `HashMap` as input. **Extraction strategy:** 1. Split `dag.rs` into: - `pg-dag` crate: `StDag`, `StNode`, `NodeId`, `DiamondConsistency`, `DiamondSchedulePolicy`, `ConsistencyGroup`, topological sort, cycle detection, diamond detection, schedule propagation - In the extension: `build_from_catalog()` adapter that reads SPI and feeds data into the crate's constructors **Value as a standalone project:** - **Any system with a DAG of dependent tasks** (CI pipelines, build systems, data pipeline orchestrators) could reuse the diamond detection + consistency group logic - The canonical-period scheduling algorithm is novel and useful independently - Fuzzable and property-testable without PG **Effort:** ~8–12 hours --- ### 1.3. `pg-cdc-triggers` — Trigger-Based CDC Library **What:** The CDC trigger management code (`src/cdc.rs`, ~1004 lines) as a reusable library for creating, managing, and consuming row-level change capture triggers on PostgreSQL tables. **Coupling analysis:** The CDC module generates and executes SQL DDL (CREATE TRIGGER, CREATE FUNCTION, DROP TRIGGER) via SPI. The *logic* of what SQL to generate is decoupled from SPI — it builds SQL strings and then executes them. **Extraction strategy:** 1. Extract a `pg-cdc-triggers` crate that provides: - Trigger function SQL generation (given a table OID/name, generate the PL/pgSQL trigger function and CREATE TRIGGER DDL) - Change buffer table DDL generation - Change consumption query generation (given a frontier range) - Change buffer cleanup query generation 2. The extension calls the library for SQL generation, then executes via SPI 3. An external sidecar could call the library, then execute via `tokio-postgres` **Value as a standalone project:** - **Any system needing PG trigger-based CDC** (audit logging, event sourcing, cache invalidation) could use the library - Pairs naturally with `pg-dag` for building custom incremental pipelines - The hybrid trigger→WAL transition logic is particularly novel **Effort:** ~16–24 hours --- ### 1.4. `pg-trickle-sidecar` — External Process / Sidecar **What:** The orchestration layer (scheduler + refresh + CDC + DVM) running as an external process connecting to PostgreSQL over standard libpq/tokio-postgres connections. This is the main deliverable of the external process architecture described in [REPORT_EXTERNAL_PROCESS.md](infra/REPORT_EXTERNAL_PROCESS.md). **Dependency on other extractions:** - Requires `pg-query-diff` (1.1) for delta SQL generation - Requires `pg-dag` (1.2) for scheduling - Requires `pg-cdc-triggers` (1.3) for CDC management - Uses `pg_query.rs` (libpg_query) instead of `pg_sys::raw_parser` - Replaces SPI with `tokio-postgres` or `sqlx` - Replaces GUCs with a config file (TOML/YAML) - Replaces `BackgroundWorker` with a Tokio runtime - Replaces shared memory with internal state (single process) **Value as a standalone project:** - **Unlocks managed PG services** (RDS, Cloud SQL, Neon, Supabase) - Can run as a Kubernetes sidecar, Docker compose service, or systemd unit - Independent scaling (run multiple scheduler instances for different DB subsets) **Effort:** ~120–160 hours (the REPORT_EXTERNAL_PROCESS.md estimates 200–300h for a full port, but with pre-extracted crates this is significantly reduced) **Sequencing:** This is the *end goal* — extracting 1.1–1.3 first makes this achievable incrementally rather than as a big-bang rewrite. --- ### 1.5. `pg-trickle-cli` — Command-Line Management Tool Already planned in [PLAN_ECO_SYSTEM.md](ecosystem/PLAN_ECO_SYSTEM.md) Project 7. Including here for completeness — it benefits from `pg-dag` (1.2) for local DAG visualization and from the config crate for shared configuration schemas. **Effort:** ~20 hours (already estimated) --- ## Part 2: Internal Decomposition (Same Crate, Better Boundaries) These changes keep everything in a single `cdylib` but introduce clearer internal boundaries via traits and module reorganization. They reduce coupling, improve testability, and prepare the ground for future crate extraction. ### 2.1. Storage Backend Trait **Current state:** `refresh.rs` (~2355 lines) directly generates SQL for `TRUNCATE`, `INSERT`, `DELETE`, `MERGE` and executes via SPI. The refresh logic (determine action, compute delta, apply, update frontier) is tightly woven with the SQL execution. **Proposal:** Introduce a `StorageBackend` trait: ```rust pub trait StorageBackend { fn execute_full_refresh(&self, st: &StreamTableMeta, query: &str) -> Result; fn execute_delta(&self, st: &StreamTableMeta, delta_sql: &str) -> Result; fn advance_frontier(&self, st: &StreamTableMeta, new_frontier: &Frontier) -> Result<()>; fn record_history(&self, record: &RefreshRecord) -> Result<()>; } ``` - `SpiStorageBackend` — current implementation (in-extension) - `SqlxStorageBackend` — future sidecar implementation (tokio-postgres) - `MockStorageBackend` — for unit-testing refresh orchestration logic **Benefit:** The refresh orchestration (which action to take, adaptive fallback, retry logic) can be unit-tested without a PG backend. **Effort:** ~8–12 hours --- ### 2.2. Parse Frontend Trait (Internal Step Toward 1.1) **Current state:** `parser.rs` calls `pg_sys::raw_parser()` directly ~150 times throughout 15K lines. Pure logic (AST walking, OpTree construction, rewrite passes) is interleaved with FFI calls. **Proposal:** Introduce a `ParseFrontend` trait: ```rust pub trait ParseFrontend { /// Parse SQL and return an abstract parse tree. fn parse(&self, sql: &str) -> Result; /// Look up function volatility. fn function_volatility(&self, schema: &str, name: &str) -> Result; /// Resolve a table's OID and column list. fn resolve_table(&self, schema: &str, name: &str) -> Result; /// Get view definition text. fn get_view_definition(&self, schema: &str, name: &str) -> Result>; } ``` With a `PgrxParseFrontend` (using `pg_sys::raw_parser` + SPI) and a future `PgQueryParseFrontend` (using `pg_query.rs` + client connection). This is the **highest-leverage internal change** — it decouples the entire DVM engine from the PostgreSQL backend in a single abstraction boundary. **Effort:** ~24–40 hours (large due to 15K lines of parser code) --- ### 2.3. Catalog Access Trait **Current state:** `catalog.rs` (~1150 lines) does all CRUD via `Spi::connect()`. Other modules (`dag.rs`, `cdc.rs`, `monitor.rs`, `hooks.rs`, `scheduler.rs`) also call SPI directly for catalog reads. **Proposal:** Introduce a `CatalogAccess` trait: ```rust pub trait CatalogAccess { fn get_stream_table(&self, name: &str) -> Result>; fn list_stream_tables(&self) -> Result>; fn get_dependencies(&self, pgt_id: i64) -> Result>; fn update_status(&self, pgt_id: i64, status: StStatus) -> Result<()>; fn record_refresh(&self, record: RefreshRecord) -> Result<()>; // ... etc } ``` - `SpiCatalogAccess` — current SPI-based implementation - `SqlxCatalogAccess` — future sidecar implementation - `InMemoryCatalogAccess` — for testing **Benefit:** Tests can inject mock catalogs. All SPI access is centralized behind a single trait rather than scattered across 6+ modules. **Effort:** ~12–16 hours --- ## Part 3: Composability Matrix How the extracted components and internal traits relate: ``` ┌──────────────────────────────────────────────┐ │ pg_trickle extension │ │ │ │ ┌──────────┐ ┌────────────┐ ┌─────────┐ │ │ │SpiParse │ │SpiCatalog │ │SpiStore │ │ │ │Frontend │ │Access │ │Backend │ │ │ └────┬─────┘ └─────┬──────┘ └────┬────┘ │ │ │ │ │ │ └───────┼──────────────┼──────────────┼────────┘ │ │ │ Trait boundaries (internal) │ │ │ │ │ ┌───────────────────────┼──────────────┼──────────────┼───────┐ │ ▼ ▼ ▼ │ │ ┌─────────────────────────┐ ┌────────────┐ ┌──────────┐ │ │ │ pg-query-diff (1.1) │ │ pg-dag │ │pg-cdc- │ │ │ │ OpTree, Operators, │ │ (1.2) │ │triggers │ │ │ │ DiffContext, RowId, │ │ Topo sort, │ │(1.3) │ │ │ │ Rewrite passes │ │ Diamonds, │ │Trigger │ │ │ │ │ │ Scheduling │ │SQL gen │ │ │ │ ParseFrontend trait │ │ │ │ │ │ │ └─────────────────────────┘ └────────────┘ └──────────┘ │ │ │ │ Extractable crates (pure Rust) │ └─────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │ pg-trickle-sidecar (1.4) │ │ │ │ PgQueryParseFrontend + SqlxCatalogAccess + SqlxStorage │ │ + Tokio scheduler + TOML config │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Part 4: Priority Ranking Ranked by **impact / effort ratio** and **unblocking effect**: | Priority | Component | Type | Effort | Impact | Unblocks | |----------|-----------|------|--------|--------|----------| | **P1** | 1.2 `pg-dag` | Extract | 8–12h | Medium | Sidecar, CLI, testing | | **P2** | 2.1 StorageBackend trait | Internal | 8–12h | High | Unit-testing refresh logic | | **P3** | 2.3 CatalogAccess trait | Internal | 12–16h | High | Centralized SPI, testing | | **P4** | 2.2 ParseFrontend trait | Internal | 24–40h | Very High | DVM extraction, sidecar | | **P5** | 1.1 `pg-query-diff` | Extract | 40–60h | Very High | Sidecar, broader ecosystem | | **P6** | 1.3 `pg-cdc-triggers` | Extract | 16–24h | Medium | Standalone CDC users | | **P7** | 1.4 `pg-trickle-sidecar` | New project | 120–160h | Very High | Managed PG services | **Recommended sequencing:** ``` Phase A (internal traits): P1 → P2 → P3 (~28–40h) Phase B (parse abstraction): P4 (~24–40h) Phase C (crate extraction): P5 → P6 (~56–84h) Phase D (sidecar): P7 (~120–160h) ``` Phase A can happen alongside v0.2.0 work. Phase B should target v0.3.0. Phases C and D are post-1.0 but the internal trait work (A+B) makes them achievable without a big-bang rewrite. --- ## Part 5: What Should NOT Be Extracted Some components are deeply tied to the PostgreSQL extension runtime and extracting them would be counterproductive: | Component | Why Keep In-Extension | |-----------|----------------------| | `hooks.rs` (DDL event triggers) | Requires PG event trigger infrastructure; no equivalent externally | | `shmem.rs` (shared memory) | PG `PgLwLock`/`PgAtomic` — replaced entirely in sidecar mode | | `config.rs` (GUCs) | PG GUC registry — replaced by config file in sidecar | | `hash.rs` (SQL hash functions) | `#[pg_extern]` wrapper around xxHash; the Rust xxHash call is one line | | `scheduler.rs` (bgworker) | PG `BackgroundWorkerBuilder` — replaced by Tokio in sidecar; but the *scheduling algorithm* (canonical periods, retry) should be extracted | | `wal_decoder.rs` | Deeply tied to PG logical replication slot API; sidecar would use `tokio-postgres` replication protocol directly | --- ## Part 6: Comparison With Existing Ecosystem Plan [PLAN_ECO_SYSTEM.md](ecosystem/PLAN_ECO_SYSTEM.md) defines 11 ecosystem projects focused on **integrations** (dbt, Airflow, Prometheus, CLI, Docker, ORM). This analysis focuses on **engine decomposition** — they are complementary: | Ecosystem Plan | This Analysis | |----------------|---------------| | Integration wrappers around SQL API | Core engine as composable crates | | All projects consume pg_trickle as a black box | Projects *are* pg_trickle's internals | | Separate repos, same SQL interface | Workspace crates, trait-based interfaces | | Useful today (0.x) | Phase A today, Phases B–D post-1.0 | The sidecar (1.4) is the bridge: it is both an ecosystem project (separate binary) and a consumer of the extracted core crates. --- ## Part 7: Risks & Mitigations | Risk | Mitigation | |------|-----------| | **Premature extraction** — extracting crates before the API stabilizes causes churn | Start with *internal* traits (Phase A+B) that keep everything in one repo but establish boundaries. Extract to crates only when the trait interfaces are stable. | | **Build complexity** — workspace with multiple crates is harder to build | Use Cargo workspace; pgrx already supports workspace members. Keep the crate graph shallow (max 2 levels). | | **Performance regression from trait dynamism** — `dyn Trait` dispatch overhead | Use generics (`impl Trait`) not `dyn Trait` — monomorphized at compile time, zero runtime cost. | | **Testing burden** — more crates = more CI matrix entries | Extracted crates are pure Rust — `cargo test` without Docker. Actually *reduces* CI cost. | | **Sidecar feature parity** — sidecar lags extension | Share the same crate for core logic; only the "glue" differs. Feature parity is structural, not manual. | --- ## References | Document | Relevance | |----------|-----------| | [REPORT_EXTERNAL_PROCESS.md](infra/REPORT_EXTERNAL_PROCESS.md) | Full sidecar feasibility study; coupling inventory | | [REPORT_PGWIRE_PROXY.md](infra/REPORT_PGWIRE_PROXY.md) | Proxy architecture (alternative to sidecar) | | [PLAN_ECO_SYSTEM.md](ecosystem/PLAN_ECO_SYSTEM.md) | Integration ecosystem plan (complementary) | | [PLAN_ADRS.md](adrs/PLAN_ADRS.md) | Architecture decisions (constraints) | | [docs/ARCHITECTURE.md](../docs/ARCHITECTURE.md) | Current architecture | | [ROADMAP.md](../ROADMAP.md) | Release timeline for sequencing | | [REPORT_DOWNSTREAM_CONSUMERS.md](infra/REPORT_DOWNSTREAM_CONSUMERS.md) | Consumer patterns that benefit from composability |