# Blue-Green Deployment for pg_trickle

**Type:** REPORT · **Status:** Exploration · **Date:** 2026-03-03

## Problem Statement

Propagating changes through a large DAG of stream tables can take significant
time — especially initial `FULL` refreshes on wide or high-cardinality tables.
When users need to modify a pipeline (new query logic, schema changes, operator
upgrades, schedule tuning), the existing pipeline must keep serving
stale-but-consistent data while the replacement pipeline catches up in the
background. Once the new pipeline is current, a hot swap redirects consumers
transparently.

This is analogous to blue-green deployment in application servers, applied to
incremental view maintenance pipelines.

### Use Cases

1. **Query logic change** — rewrite a defining query (e.g., add a new join or
   aggregation) without downtime.
2. **Schema evolution** — upstream `ALTER TABLE` requires rebuilding downstream
   STs; keep the old pipeline serving while the new one reinitializes.
3. **Refresh mode migration** — switch from `FULL` to `DIFFERENTIAL` (or vice
   versa) with a safety net.
4. **DAG restructuring** — split, merge, or reorder stream tables in the
   dependency graph.
5. **Extension upgrade** — new pg_trickle version with different DVM operators
   needs pipeline rebuild.

### Design Goals

| Goal | Priority |
|------|----------|
| Zero (or near-zero) read downtime during swap | Must-have |
| Works for individual STs and whole pipelines | Must-have |
| Automatic convergence detection + manual override | Must-have |
| Rollback capability | Should-have |
| No breaking changes to existing catalog schema | Should-have |
| Minimal scheduler overhead during dual-pipeline phase | Nice-to-have |

---

## Current Architecture — What Already Supports Blue-Green

A thorough analysis of the existing internals reveals that pg_trickle is
**already largely blue-green friendly** at the infrastructure level.

### CDC Sharing (Ready)

Triggers and change buffer tables are keyed by **source OID**, not by stream
table ID. The `pgtrickle.pgt_change_tracking.tracked_by_pgt_ids` array
supports multiple consumers. Both the blue (existing) and green (new) STs can
consume from the same change buffers simultaneously with independent frontiers.

- Trigger naming: `pg_trickle_cdc_{source_oid}` — one trigger per source
  regardless of how many STs depend on it.
- Buffer table: `pgtrickle_changes.changes_{source_oid}` — shared.
- Deferred cleanup uses `MIN(frontier)` across all consumers, so green's
  presence prevents premature deletion of rows it hasn't consumed yet.

### Independent Frontiers (Ready)

Each ST has its own `frontier` JSONB column in `pgt_stream_tables` tracking
per-source LSN watermarks. A green ST with a different `pgt_id` maintains
completely independent frontier state.

### Internal Key Isolation (Ready)

- **MERGE template cache** — keyed by `pgt_id` (thread-local `HashMap`).
- **Prepared statements** — per-session, keyed by `pgt_id`.
- **Advisory locks** — keyed by `pgt_id`.
- **Retry/backoff state** — keyed by `pgt_id`.

Two versions of the same logical ST with different `pgt_id` values will not
collide in any internal data structure.

### DAG Rebuild (Ready)

`signal_dag_rebuild()` is a cheap atomic increment. The scheduler rebuilds the
full DAG from catalog tables on the next cycle, so adding/removing STs is a
live operation with no restart required.

### Status Lifecycle (Ready)

The `INITIALIZING → ACTIVE → SUSPENDED → ERROR` state machine exists. A green
ST can be created in `INITIALIZING` state, manually refreshed to catch up, then
flipped to `ACTIVE` for the scheduler to pick up.

---

## Core Challenge: Identity & Naming

The central tension is the **UNIQUE constraint** on
`(pgt_schema, pgt_name)` in `pgt_stream_tables` and the **UNIQUE on
`pgt_relid`** (storage table OID). Two STs cannot share the same qualified name
or point to the same physical table. The swap mechanism must navigate this.

---

## Approaches

### Approach A: Shadow Tables with Atomic Rename

**Concept:** Create the green ST with a shadow name (e.g.,
`order_totals__pgt_green`), let it catch up, then do an atomic three-way rename
in a single transaction.

**Lifecycle:**

```
1. User calls:  pgtrickle.create_green('order_totals', query => '...', ...)
2. System creates ST:  order_totals__pgt_green  (status: INITIALIZING)
3. System runs FULL refresh on green ST
4. Green ST flips to ACTIVE — scheduler runs differentials alongside blue
5. Convergence detected (or user calls promote)
6. In one transaction:
     ALTER TABLE order_totals         RENAME TO order_totals__pgt_retiring;
     ALTER TABLE order_totals__pgt_green RENAME TO order_totals;
     UPDATE pgtrickle.pgt_stream_tables ...  -- fix names, relids
     DROP the retired ST (or keep for rollback)
7. Signal DAG rebuild
```

**Pros:**
- Simple mental model — two STs, one rename.
- Consumers querying `SELECT * FROM order_totals` see no change.
- Minimal catalog schema changes (only need a `green_of BIGINT` FK column).
- Works for both per-ST and whole-pipeline swaps.

**Cons:**
- `ALTER TABLE ... RENAME` takes `AccessExclusiveLock` — brief (sub-ms
  catalog-only operation) but blocks concurrent queries momentarily.
- For whole-pipeline swap of N STs, N renames in one transaction — lock
  ordering must be deterministic to avoid deadlocks.
- Downstream STs referencing the blue ST by OID in their `defining_query` need
  query text rewriting at promote time.
- Not truly zero-downtime: there is a lock window (typically < 1ms per table).

**Complexity:** Low–Medium.

---

### Approach B: Views as Public Interface

**Concept:** The public-facing name is always a **view**. The physical storage
table has a versioned name. Swap = `CREATE OR REPLACE VIEW` pointing to the new
storage table.

**Lifecycle:**

```
1. On initial ST creation:
     Storage table:  order_totals__pgt_v1
     Public view:    CREATE VIEW order_totals AS SELECT * FROM order_totals__pgt_v1;
2. Green version:    order_totals__pgt_v2  (new ST, separate pgt_id)
3. Once caught up:   CREATE OR REPLACE VIEW order_totals AS SELECT * FROM order_totals__pgt_v2;
4. Drop v1 ST (or retain for rollback)
```

**Pros:**
- `CREATE OR REPLACE VIEW` is lightweight, faster than rename.
- Clean separation: logical name (view) vs physical storage (versioned table).
- Rollback = point the view back.
- View replacement doesn't invalidate open cursors on the old version.

**Cons:**
- **Breaking change for existing deployments** — all STs would now live behind
  views. This changes behavior for `INSERT` (views need `INSTEAD OF` triggers
  or are read-only), `\d` output, `pg_class` catalogue lookups, index
  visibility from `pg_indexes`, etc.
- Adds an indirection layer to every read query (minor but measurable on
  micro-benchmarks).
- Downstream ST `defining_query` that references the view gets rewritten via
  the existing view-inlining code in `api.rs` — the inliner resolves views to
  their base query, so it would see the underlying versioned table. This
  creates a coupling: green STs depending on another green ST would need their
  view pointer updated first (ordering constraint).
- More complex catalog model: need to track both view OID and storage table
  OID per entry.

**Complexity:** Medium–High. The breaking-change aspect makes this unsuitable
as a retrofit unless pg_trickle adopts views-as-public-interface from the start
(e.g., as a major version change).

---

### Approach C: Pipeline Generations (First-Class Concept)

**Concept:** Introduce a "pipeline" entity with a generation counter. STs
belong to a pipeline + generation. Only one generation is "active" at a time.

**New catalog table:**

```sql
CREATE TABLE pgtrickle.pgt_pipelines (
    pipeline_id    BIGSERIAL PRIMARY KEY,
    pipeline_name  TEXT UNIQUE NOT NULL,
    active_gen     INT NOT NULL DEFAULT 1,
    green_gen      INT,        -- NULL when no green is being prepared
    created_at     TIMESTAMPTZ DEFAULT now()
);
```

**Extended `pgt_stream_tables`:**

```sql
ALTER TABLE pgtrickle.pgt_stream_tables
    ADD COLUMN pipeline_id  BIGINT REFERENCES pgtrickle.pgt_pipelines,
    ADD COLUMN generation   INT;
```

The UNIQUE constraint on `(pgt_schema, pgt_name)` would be relaxed to
`UNIQUE (pgt_schema, pgt_name, generation)`.

**Lifecycle:**

```
1. pgtrickle.begin_green('my_pipeline')
   → clones all STs in the pipeline with generation + 1
   → new storage tables, new pgt_ids, inherits queries/config
   → green STs start catching up (FULL → DIFFERENTIAL)

2. pgtrickle.promote_green('my_pipeline')
   → atomically swap active_gen, rename tables, update references
   → mark old generation as retired

3. pgtrickle.rollback_green('my_pipeline')
   → drop green-generation STs, reset green_gen to NULL

4. pgtrickle.cleanup_generation('my_pipeline', gen)
   → drop retired storage tables and catalog entries
```

**Pros:**
- First-class concept with clear semantics and good observability.
- `pgt_status()` can show generation info: "green is 95% caught up."
- Supports both whole-pipeline and per-ST swaps (single-ST "pipeline").
- Enables rollback without keeping shadow tables.
- Generation info is queryable — useful for monitoring and alerting.

**Cons:**
- Largest schema change: new table, altered `pgt_stream_tables`, migration
  path for existing deployments, extension upgrade SQL.
- "Pipeline" grouping may not map cleanly to arbitrary DAG subsets — what if
  a user wants to green-deploy 3 of 10 STs that share sources?
- The rename/swap mechanics under the hood are the same as Approach A — this
  is essentially Approach A with richer metadata.
- More concepts for users to learn.

**Complexity:** High, but most comprehensive.

---

### Approach D: Schema-Based Isolation

**Concept:** Blue pipeline lives in the user's schema. Green pipeline lives in
a staging schema (e.g., `pgtrickle_staging`). Swap = rename schemas or adjust
`search_path`.

**Lifecycle:**

```
1. pgtrickle.prepare_green(staging_schema => 'pgtrickle_staging')
   → recreate all STs in the staging schema
2. Green STs catch up in isolation
3. ALTER SCHEMA public RENAME TO public_old;
   ALTER SCHEMA pgtrickle_staging RENAME TO public;
4. Drop old schema
```

**Pros:**
- Complete isolation during build-up.
- Schema rename is a single catalog operation (fast).

**Cons:**
- Very coarse-grained — swaps the **entire schema**, not just STs.
- Source tables in the original schema → cross-schema query references.
- `ALTER SCHEMA ... RENAME` is `AccessExclusiveLock` on all contained objects.
- Doesn't support per-ST granularity at all.
- Application `search_path` changes required.

**Complexity:** Medium, but too inflexible for most scenarios.

---

## Comparison Matrix

| Criterion | A: Shadow Rename | B: Views | C: Generations | D: Schema |
|-----------|:---:|:---:|:---:|:---:|
| Per-ST granularity | ✅ | ✅ | ✅ | ❌ |
| Whole-pipeline swap | ✅ | ✅ | ✅ | ✅ |
| No breaking changes | ✅ | ❌ | ⚠️ Schema migration | ✅ |
| Rollback support | ⚠️ Manual | ✅ Easy | ✅ Built-in | ⚠️ Manual |
| Lock duration | ~ms | ~0 | ~ms | ~ms × N |
| Implementation effort | Low | Medium-High | High | Medium |
| Observability | Basic | Basic | Rich | Basic |
| Downstream query rewrite | Required | Automatic (via view) | Required | Not needed |
| Catalog schema changes | Minimal | Moderate | Significant | None |

---

## Convergence Detection

For auto-swap, the system needs to determine when the green pipeline has
"caught up" to the blue pipeline. Options:

### 1. Frontier LSN Comparison (Recommended)

Compare `green.frontier.sources[oid].lsn` ≥ `blue.frontier.sources[oid].lsn`
for all shared sources. When all sources pass, green is at least as fresh as
blue.

- **Pro:** Already tracked, no extra queries, precise.
- **Con:** Doesn't guarantee data equivalence (different query logic produces
  different results even at the same LSN).

### 2. Data Timestamp Comparison

`green.data_timestamp ≥ blue.data_timestamp` — simpler but coarser.

- **Pro:** Single scalar comparison.
- **Con:** Timestamp resolution may not match LSN precision.

### 3. Lag Threshold

User-defined maximum acceptable lag: "swap when green is within 5 seconds of
blue." Useful for time-sensitive workloads where exact convergence isn't
required.

### 4. Row Count / Content Hash

Compare `COUNT(*)` or `pg_trickle.content_hash(table)` between blue and green
storage tables.

- **Pro:** Deterministic proof of equivalence.
- **Con:** Expensive on large tables; only meaningful when queries are identical.

**Recommendation:** Use **frontier LSN comparison** (#1) as the primary
mechanism, with an optional **lag threshold** (#3) for user control. Content
hash (#4) can be offered as a `verify` flag for cautious deployments.

---

## Recommendation

**Approach A (Shadow Tables with Atomic Rename)** as the implementation
strategy, augmented with lightweight metadata from **Approach C** for
observability.

### Rationale

- Approach A is the **lowest-risk, lowest-effort** path that solves all
  must-have requirements.
- The internal machinery (CDC sharing, independent frontiers, `pgt_id`
  isolation) already supports dual pipelines — what's missing is just the
  orchestration layer.
- Adding a `green_of BIGINT` FK and `pipeline_group TEXT` column to
  `pgt_stream_tables` gives the observability benefits of Approach C without
  the full pipeline entity overhead.
- Approach B (views) would be a better long-term architecture but is a
  breaking change that warrants its own migration plan.
- Approach D is too coarse for real-world use.

### Proposed API Surface

```sql
-- Create a green version of an existing stream table.
-- Inherits query/schedule/mode from blue unless overridden.
SELECT pgtrickle.create_green(
    'public.order_totals',
    query    => 'SELECT ... FROM orders JOIN ...',  -- optional override
    schedule => '10s',                               -- optional override
    mode     => 'DIFFERENTIAL'                       -- optional override
);

-- Check green progress: frontier lag, row count, status.
SELECT * FROM pgtrickle.green_status('public.order_totals');
-- Returns: green_name, status, frontier_lag_bytes, data_timestamp_lag,
--          blue_row_count, green_row_count, converged (bool)

-- Promote: atomic swap green → active, retire blue.
-- Fails if green hasn't converged (override with force => true).
SELECT pgtrickle.promote_green('public.order_totals');
SELECT pgtrickle.promote_green('public.order_totals', force => true);

-- Promote all green STs that have converged.
SELECT * FROM pgtrickle.promote_all_green();

-- Abort: drop green ST, clean up.
SELECT pgtrickle.cancel_green('public.order_totals');

-- Rollback: swap back to the retired blue (if not yet cleaned up).
SELECT pgtrickle.rollback_green('public.order_totals');
```

### Catalog Changes (Minimal)

```sql
ALTER TABLE pgtrickle.pgt_stream_tables
    ADD COLUMN green_of       BIGINT REFERENCES pgtrickle.pgt_stream_tables(pgt_id),
    ADD COLUMN pipeline_group TEXT;
```

- `green_of` — points to the blue ST's `pgt_id`. `NULL` for normal STs.
- `pipeline_group` — optional label for grouping multi-ST swaps.

### Promote Transaction (Pseudocode)

```sql
BEGIN;

-- 1. Lock both tables deterministically (by OID, ascending) to avoid deadlocks
LOCK TABLE blue_table IN ACCESS EXCLUSIVE MODE;
LOCK TABLE green_table IN ACCESS EXCLUSIVE MODE;

-- 2. Rename
ALTER TABLE blue_table  RENAME TO order_totals__pgt_retired;
ALTER TABLE green_table RENAME TO order_totals;

-- 3. Update catalog
UPDATE pgtrickle.pgt_stream_tables
   SET pgt_name = 'order_totals__pgt_retired',
       status   = 'SUSPENDED'
 WHERE pgt_id   = :blue_id;

UPDATE pgtrickle.pgt_stream_tables
   SET pgt_name  = 'order_totals',
       green_of  = NULL
 WHERE pgt_id    = :green_id;

-- 4. Rewrite defining_query of any downstream STs that reference
--    the old blue OID → new green OID
UPDATE pgtrickle.pgt_dependencies
   SET source_relid = :green_oid
 WHERE source_relid = :blue_oid;

-- 5. Signal DAG rebuild
SELECT pgtrickle.signal_dag_rebuild();

COMMIT;
```

The blue ST remains in the catalog as `SUSPENDED` until explicitly cleaned up,
enabling rollback.

---

## Open Questions

1. **Downstream OID rewrites.** When a green ST replaces a blue ST that other
   STs depend on, the downstream `defining_query` text contains the old OID (or
   table name). How deep does the rewrite need to go? Just the dependency
   edges, or the SQL text too? (The SQL text uses names, not OIDs, so renaming
   the table should suffice — but needs verification.)

2. **Consistency groups during transition.** If blue and green STs participate
   in the same diamond consistency group, could the scheduler produce
   inconsistent results? Likely no — they have independent `pgt_id`s and
   different names, so the consistency group logic treats them as separate
   entities. But edge cases need analysis.

3. **Whole-pipeline ordering during promote.** For a pipeline with 5 STs in a
   DAG, the promote must happen in topological order (upstream first) so that
   downstream STs immediately see the promoted upstream. Or should the promote
   be all-at-once in a single transaction?

4. **CDC trigger sharing with modified queries.** If the green ST has a
   different `defining_query` that references *additional* source tables not in
   the blue ST's dependency set, new CDC triggers must be created for those
   sources. The existing `setup_cdc_for_source()` handles this, but the green
   ST creation flow needs to call it.

5. **Storage table column mismatch.** If the green ST has a different column
   set (due to query changes), the rename alone isn't sufficient — downstream
   queries expecting the blue schema will break. Should `promote_green` verify
   schema compatibility, or is this the user's responsibility?

6. **Advisor lock contention.** Both blue and green STs use independent
   advisory locks (keyed by `pgt_id`), so no contention. But during the promote
   transaction, we need to ensure neither ST is mid-refresh. The promote should
   acquire both advisory locks before proceeding.

7. **Auto-promote timing.** Should the scheduler perform auto-promote, or
   should it be a separate background check? Embedding it in the scheduler
   loop adds complexity; a separate polling function called by cron may be
   simpler.

---

## Related Work

- [PLAN_UPGRADE_MIGRATIONS.md](../sql/PLAN_UPGRADE_MIGRATIONS.md) — extension
  upgrade migrations (overlapping concern for version-to-version transitions).
- [PLAN_HYBRID_CDC.md](../sql/PLAN_HYBRID_CDC.md) — CDC mode transitions
  already implement a form of "hot swap" from trigger to WAL.
- [PLAN_FUSE.md](../sql/PLAN_FUSE.md) — anomalous change volume detection
  could trigger auto-rollback of a green pipeline.
- [REPORT_EXTERNAL_PROCESS.md](REPORT_EXTERNAL_PROCESS.md) — external sidecar
  could orchestrate blue-green at a higher level.
- [REPORT_DOWNSTREAM_CONSUMERS.md](REPORT_DOWNSTREAM_CONSUMERS.md) — downstream
  consumer patterns (change feeds, NOTIFY, logical replication) and how they
  interact with blue-green swap.

---

## Next Steps

1. **Validate Approach A** with a manual proof-of-concept: create two STs on
   the same source, let both catch up, then rename in a transaction. Measure
   lock duration and verify downstream queries.
2. **Prototype `create_green` / `promote_green`** as SQL functions — minimal
   viable implementation.
3. **Design the convergence check** — implement `green_status()` using frontier
   comparison.
4. **Evaluate Approach B (views)** as a longer-term migration path, potentially
   gated behind a GUC (`pg_trickle.use_view_aliases = on`).
5. **Write an ADR** once we commit to an approach.