# v0.85.0 — Kubernetes-Native Deployment

> **Status:** Planned  
> **Scope:** Very Large  
> **Driven by:** [Assessment 16](../plans/PLAN_OVERALL_ASSESSMENT_16.md) — LT-1, LT-2, LT-6

## Theme

Enable pg_trickle to run as a fully managed, auto-scaling system on
Kubernetes. A custom operator manages worker deployments, CDC consumers,
and HPA policies via Custom Resource Definitions. An external change log
backend (Kafka, NATS JetStream, or local WAL files) provides durable,
distributed change event storage. Declarative SQL syntax makes stream table
creation feel native to PostgreSQL.

## Items

### LT-1: Kubernetes Operator

A Kubernetes operator (Rust, using `kube-rs` controller-runtime) that manages
pg_trickle infrastructure:

**Custom Resource Definitions:**

```yaml
apiVersion: trickle-labs.io/v1alpha1
kind: StreamTableCluster
metadata:
  name: production
spec:
  postgresql:
    host: pg-primary.default.svc
    port: 5432
    database: myapp
    secretRef:
      name: pg-credentials
  workers:
    min: 2
    max: 32
    targetCdcLagMs: 100
    targetCpuPercent: 70
    image: ghcr.io/trickle-labs/pg-trickle-worker:0.85.0
  cdc:
    mode: external
    replicas: 1
    logBackend: nats
    nats:
      url: nats://nats.default.svc:4222
      stream: pgtrickle-changes
  monitoring:
    prometheus: true
    serviceMonitor: true
    otlp:
      endpoint: otel-collector.monitoring.svc:4317
```

**Operator responsibilities:**
- Deploys worker pods as a Deployment (stateless, horizontally scalable)
- Deploys CDC consumer as a StatefulSet (one pod per replication slot)
- Configures HPA based on CDC lag custom metric
- Watches PostgreSQL catalog for new/dropped stream tables
- Manages NATS/Kafka stream lifecycle
- Health checks and automatic restart on failure
- Graceful draining on pod termination (finish in-progress refresh)

**Helm chart:** `charts/pg-trickle/` with values for common configurations.

### LT-2: External Change Log Backend

Support pluggable change log backends between CDC consumers and DVM workers:

| Backend | Durability | Ordering | Deployment |
|---------|-----------|----------|------------|
| **PostgreSQL tables** (default) | WAL-logged | Per-source total order | Zero infrastructure |
| **Local WAL files** | fsync'd | Per-source total order | Single-node, no external deps |
| **NATS JetStream** | Replicated | Per-subject (source) order | Lightweight, K8s-native |
| **Apache Kafka** | Replicated | Per-partition order | Enterprise, existing infra |

**Change event format** (common across all backends):
```json
{
  "source_oid": 16384,
  "lsn": "0/1A3B4C0",
  "xid": 12345,
  "action": "INSERT",
  "columns": {"id": 1, "name": "Alice", "amount": 100.00},
  "old_columns": null,
  "timestamp": "2026-05-30T12:00:00Z"
}
```

**Backend interface trait** (in `crates/change-log/`):
```rust
#[async_trait]
pub trait ChangeLogBackend: Send + Sync {
    async fn publish(&self, source_oid: u32, events: &[ChangeEvent]) -> Result<()>;
    async fn consume(&self, source_oid: u32, from_lsn: Lsn) -> Result<Vec<ChangeEvent>>;
    async fn acknowledge(&self, source_oid: u32, up_to_lsn: Lsn) -> Result<()>;
    async fn lag(&self, source_oid: u32) -> Result<Duration>;
}
```

### LT-6: Declarative SQL Syntax

Implement `CREATE STREAM TABLE` syntax via `ProcessUtility_hook`:

```sql
-- Basic usage (mirrors CREATE MATERIALIZED VIEW)
CREATE STREAM TABLE my_aggregates AS
SELECT category, SUM(amount) as total
FROM orders
GROUP BY category
WITH (schedule = '5s', refresh_mode = 'differential');

-- With explicit options
CREATE STREAM TABLE IF NOT EXISTS realtime_dashboard AS
SELECT ...
WITH (
    schedule = '1s',
    refresh_mode = 'auto',
    preset = 'real-time',
    cdc_mode = 'external'
);

-- ALTER syntax
ALTER STREAM TABLE my_aggregates SET (schedule = '10s');
ALTER STREAM TABLE my_aggregates SET QUERY AS SELECT ...;

-- DROP syntax
DROP STREAM TABLE my_aggregates;
DROP STREAM TABLE IF EXISTS my_aggregates CASCADE;
```

Implementation:
- `ProcessUtility_hook` intercepts unrecognized utility statements
- Parses `CREATE STREAM TABLE` / `ALTER STREAM TABLE` / `DROP STREAM TABLE`
- Translates to corresponding `pgtrickle.*` function calls
- Reports errors with standard PostgreSQL error formatting
- `pg_dump` support via custom `COMMENT` tags (same as current approach)
- Tab-completion in psql via `CREATE STREAM` prefix

### Metrics Adapter for HPA

A lightweight sidecar (`pg_trickle_metrics_adapter`) that:
- Queries `pgtrickle.pg_stat_pgtrickle` and Prometheus metrics
- Exposes them as Kubernetes custom metrics via the Custom Metrics API
- Enables HPA to scale workers based on:
  - `pg_trickle_cdc_lag_p95_ms` (target: 100ms)
  - `pg_trickle_parallel_queue_depth` (target: 0)
  - `pg_trickle_worker_cpu_percent` (target: 70%)

### CDC Consumer StatefulSet

The `pg_trickle_cdc` binary deployed as a Kubernetes StatefulSet:
- One pod per logical replication slot (stable network identity for slot)
- Exactly-once delivery to the change log via two-phase acknowledgment
- Automatic slot creation/drop on pod start/stop
- Graceful handling of PostgreSQL failover (reconnect to new primary)
- Liveness probe: slot lag not exceeding threshold
- Readiness probe: consuming events (not blocked)