# pg_trickle Monitoring Stack This directory provides a complete observability setup for pg_trickle using **postgres_exporter**, **Prometheus**, and **Grafana**. ## Quick Start ```bash docker compose up -d ``` Then open Grafana at (default credentials: `admin` / `admin`). The **pg_trickle Overview** dashboard is pre-provisioned. ## Architecture ``` PostgreSQL + pg_trickle │ │ custom SQL queries ▼ postgres_exporter (:9187) │ │ /metrics (Prometheus format) ▼ Prometheus (:9090) │ │ data source ▼ Grafana (:3000) ``` ## Files | File | Purpose | |------|---------| | `docker-compose.yml` | Demo stack: PG + pg_trickle + exporter + Prometheus + Grafana | | `prometheus/prometheus.yml` | Prometheus scrape configuration | | `prometheus/pg_trickle_queries.yml` | postgres_exporter custom SQL queries (OBS-1) | | `prometheus/alerts.yml` | Alerting rules: staleness, errors, CDC lag (OBS-2) | | `grafana/provisioning/datasources/prometheus.yml` | Auto-provisioned data source | | `grafana/provisioning/dashboards/provider.yml` | Dashboard provisioning config | | `grafana/dashboards/pg_trickle_overview.json` | Overview dashboard (OBS-3) | ## Connecting to an Existing PostgreSQL Instance If you already have PostgreSQL + pg_trickle running, set these environment variables before `docker compose up`: ```bash export PG_HOST=your-pg-host export PG_PORT=5432 export PG_USER=postgres export PG_PASSWORD=yourpassword export PG_DATABASE=yourdb ``` Or edit the `DATA_SOURCE_NAME` in `docker-compose.yml` directly. ## Metrics Exposed All metrics are prefixed `pg_trickle_`. | Metric | Type | Description | |--------|------|-------------| | `pg_trickle_stream_tables_total` | gauge | Total stream tables by status | | `pg_trickle_stale_tables_total` | gauge | Tables with data older than schedule | | `pg_trickle_consecutive_errors` | gauge | Per-table consecutive error count | | `pg_trickle_refresh_duration_ms` | gauge | Average refresh duration (ms) | | `pg_trickle_total_refreshes` | counter | Total refresh count per table | | `pg_trickle_failed_refreshes` | counter | Failed refresh count per table | | `pg_trickle_rows_inserted_total` | counter | Rows inserted per table | | `pg_trickle_rows_deleted_total` | counter | Rows deleted per table | | `pg_trickle_staleness_seconds` | gauge | Seconds since last successful refresh | | `pg_trickle_cdc_pending_rows` | gauge | Pending rows in CDC change buffer | | `pg_trickle_cdc_buffer_bytes` | gauge | CDC change buffer size in bytes | | `pg_trickle_scheduler_running` | gauge | 1 if scheduler background worker is alive | | `pg_trickle_health_status` | gauge | Overall health: 0=OK, 1=WARNING, 2=CRITICAL | ## Alerting Alerts are defined in `prometheus/alerts.yml`: | Alert | Condition | Severity | |-------|-----------|----------| | `PgTrickleTableStale` | Staleness > 5 minutes past schedule | warning | | `PgTrickleConsecutiveErrors` | ≥ 3 consecutive refresh failures | warning | | `PgTrickleTableSuspended` | Any table in SUSPENDED status | critical | | `PgTrickleCdcBufferLarge` | CDC buffer > 1 GB | warning | | `PgTrickleSchedulerDown` | Scheduler not running for > 2 minutes | critical | | `PgTrickleHighRefreshDuration` | Avg refresh > 30 s | warning | ## Requirements - Docker 24+ with Compose v2 - pg_trickle 0.10.0+ installed in the target database - PostgreSQL user with `SELECT` on `pgtrickle.*` schema