# v0.87.0 — Enterprise Federation & State Management

> **Status:** Planned  
> **Scope:** Very Large  
> **Driven by:** [Assessment 16](../plans/PLAN_OVERALL_ASSESSMENT_16.md) — LT-5, LT-8

## Theme

Complete the distributed scaling arc with multi-cluster federation (stream
tables spanning multiple PostgreSQL clusters), object-storage state
checkpointing for disaster recovery, and the zero-config adaptive mode that
detects deployment context and configures behavior automatically. This release
establishes pg_trickle as a true enterprise-grade distributed IVM platform.

## Items

### LT-5: Multi-Cluster Federation

Coordinate stream tables across multiple PostgreSQL clusters with a global
DAG and cross-cluster frontier synchronization:

**Architecture:**
```
┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│ Cluster A        │     │ Cluster B        │     │ Cluster C        │
│ • Source: orders │     │ • Source: users  │     │ • ST: dashboard  │
│ • ST: order_agg  │────▶│ • ST: user_stats │────▶│   (joins A + B)  │
│   (local)        │     │   (local)        │     │   (federated)    │
└──────────────────┘     └──────────────────┘     └──────────────────┘
         │                        │                        │
         └────────────────────────┴────────────────────────┘
                          │
                  ┌───────────────┐
                  │ Federation    │
                  │ Coordinator   │
                  │ (global DAG)  │
                  └───────────────┘
```

**Components:**
1. **Federation Coordinator:** A standalone service that maintains the global
   DAG across clusters. Runs as a Kubernetes Deployment or standalone binary.
2. **Cross-cluster change forwarding:** When Cluster A's `order_agg` ST is
   refreshed, its change buffer events are forwarded to Cluster C via the
   change log backend (Kafka/NATS).
3. **Global frontier synchronization:** The coordinator tracks per-cluster
   frontiers and ensures downstream clusters only process events up to the
   consistent frontier of all upstream clusters.
4. **Cluster discovery:** Clusters register with the coordinator via
   `pgtrickle.register_federation(coordinator_url)`.

**SQL API:**
```sql
-- On Cluster C (consumer):
SELECT pgtrickle.create_federated_stream_table(
    'dashboard',
    $$SELECT o.category, u.region, SUM(o.amount)
      FROM cluster_a.order_agg o
      JOIN cluster_b.user_stats u ON o.user_id = u.id
      GROUP BY 1, 2$$,
    sources => ARRAY[
        'nats://coordinator/cluster_a/order_agg',
        'nats://coordinator/cluster_b/user_stats'
    ]
);
```

**Consistency model:**
- **Eventual consistency** (default): Each cluster refreshes independently;
  cross-cluster joins may see temporarily inconsistent snapshots.
- **Causal consistency** (opt-in): The coordinator enforces that downstream
  clusters only see events that are causally consistent across all upstream
  clusters. Adds latency equal to the slowest upstream cluster.

### LT-8: State Checkpointing to Object Storage

Periodically checkpoint DVM state to S3/GCS for disaster recovery
independent of pg_basebackup:

**What is checkpointed:**
- Stream table metadata (defining queries, schedules, configurations)
- Frontier positions (per-source, per-ST)
- Template cache (delta SQL templates, Aho-Corasick automata serialized)
- DAG topology (edges, SCC groupings, diamond consistency groups)
- Cost model summaries (refresh history aggregates)
- Worker registry state

**Checkpoint format:** A single JSONL file per checkpoint:
```
s3://my-bucket/pgtrickle/checkpoints/2026-05-30T12:00:00Z.jsonl.zst
```

**Checkpoint lifecycle:**
- Triggered every `pg_trickle.checkpoint_interval` (default 1h)
- Also triggered before extension upgrade (`ALTER EXTENSION ... UPDATE`)
- Retained for `pg_trickle.checkpoint_retention_days` (default 7)
- Compressed with Zstandard (typically <1MB for 1000-ST deployments)

**Recovery from checkpoint:**
```sql
SELECT pgtrickle.restore_from_checkpoint(
    's3://my-bucket/pgtrickle/checkpoints/2026-05-30T12:00:00Z.jsonl.zst'
);
```

This recreates all stream tables, dependencies, and configurations but marks
all STs for `REINITIALIZE` (since the actual data may have diverged since the
checkpoint). Useful for:
- Restoring pg_trickle configuration after a database restore from a backup
  that doesn't include the `pgtrickle` schema
- Migrating pg_trickle state to a new PostgreSQL cluster
- Disaster recovery when the primary is lost and only a replica (without
  pg_trickle extension) is available

### Global Priority Queue & Resource Governor

For multi-database PostgreSQL clusters (multi-tenant SaaS), implement a
global resource governor:

**Architecture:**
- Shared-memory priority queue across all per-database schedulers
- Priority based on: (1) SLA tier, (2) current lag, (3) last refresh age
- Global worker budget: `pg_trickle.global_max_workers` across all databases
- Per-database quotas: `pg_trickle.per_db_max_workers` (default unlimited)
- Priority preemption: HIGH-SLA STs can preempt LOW-SLA refreshes in progress

**GUCs:**
- `pg_trickle.global_max_workers = 16`
- `pg_trickle.per_db_max_workers = -1` (unlimited)
- `pg_trickle.priority_preemption = false`

### Zero-Config Adaptive Mode

The system detects its deployment context and configures behavior
automatically without any GUC changes:

| Context | Detection Method | Auto-Configuration |
|---------|-----------------|-------------------|
| **Developer laptop** | `max_connections < 20` AND no `pgt_worker_registry` entries | 1 worker, trigger CDC, 5s interval, 32MB memory |
| **Small server** | `max_connections < 100` AND no external workers | 4 workers, trigger CDC, 1s interval, 64MB memory |
| **Production server** | External workers in `pgt_worker_registry` | Coordinator mode, dispatch to external workers |
| **Kubernetes** | `PG_TRICKLE_K8S_MODE=true` env var OR operator CRD detected | Full distributed: external CDC, external workers, log backend |

**Override:** Any GUC explicitly set by the user takes precedence over
adaptive detection. The adaptive mode only sets defaults for GUCs that are
at their compile-time default values.

**Reporting:** `SELECT pgtrickle.detected_mode()` returns the current adaptive
mode and the reasoning for the detection.