--- title: Troubleshooting description: Common issues and diagnostic steps for pg_stat_ch --- ## Extension won't load ``` WARNING: pg_stat_ch must be loaded via shared_preload_libraries ``` pg_stat_ch must be loaded at server startup. Add it to `postgresql.conf` and restart: ```ini shared_preload_libraries = 'pg_stat_ch' ``` Then restart PostgreSQL: ```bash sudo systemctl restart postgresql ``` ## Background worker not running Check if the exporter process is visible: ```sql SELECT pid, application_name, state, wait_event FROM pg_stat_activity WHERE application_name = 'pg_stat_ch exporter'; ``` If no row appears: - Verify `shared_preload_libraries` includes `pg_stat_ch` - Check PostgreSQL logs for startup errors - The worker restarts automatically after 10 seconds if it crashes -- check logs for repeated crash/restart cycles ## Events not appearing in ClickHouse ```sql SHOW pg_stat_ch.enabled; ``` Should return `on`. If `off`, enable it: ```sql ALTER SYSTEM SET pg_stat_ch.enabled = on; SELECT pg_reload_conf(); ``` ```sql SELECT enqueued_events, exported_events, dropped_events, send_failures, last_error_text, last_error_ts FROM pg_stat_ch_stats(); ``` - `enqueued_events = 0`: No queries are being captured. Run some queries and check again. - `enqueued_events > 0` but `exported_events = 0`: The background worker cannot reach ClickHouse. - `send_failures > 0`: Check `last_error_text` for the specific error. ```sql SHOW pg_stat_ch.clickhouse_host; SHOW pg_stat_ch.clickhouse_port; SHOW pg_stat_ch.clickhouse_database; ``` Verify ClickHouse is reachable from the PostgreSQL host: ```bash clickhouse-client -h -p -q "SELECT 1" ``` ```sql -- In ClickHouse SHOW TABLES FROM pg_stat_ch; ``` If the database or tables don't exist, apply the schema: ```bash clickhouse-client < docker/init/00-schema.sql ``` Look for connection errors or export failures: ```bash grep -i "pg_stat_ch" /var/log/postgresql/postgresql-*.log | tail -20 ``` ## High queue usage If `queue_usage_pct` from `pg_stat_ch_stats()` is consistently above 80%: 1. **Decrease flush interval** -- export more frequently: ```sql ALTER SYSTEM SET pg_stat_ch.flush_interval_ms = 100; SELECT pg_reload_conf(); ``` 2. **Increase batch size** -- drain more events per cycle: ```sql ALTER SYSTEM SET pg_stat_ch.batch_max = 500000; SELECT pg_reload_conf(); ``` 3. **Check ClickHouse health** -- slow inserts cause backpressure: ```sql -- In ClickHouse: check for merge backlog SELECT table, count() AS parts FROM system.parts WHERE database = 'pg_stat_ch' AND active GROUP BY table; ``` 4. **Increase queue capacity** (requires restart): ```ini pg_stat_ch.queue_capacity = 262144 -- must be power of 2 ``` ## Dropped events ```sql SELECT dropped_events FROM pg_stat_ch_stats(); ``` Non-zero `dropped_events` means the queue filled up before the background worker could drain it. This is safe -- queries continue running unaffected -- but some telemetry is lost. Common causes: - ClickHouse is down or unreachable (events queue up with nowhere to go) - A burst of queries exceeded the queue capacity - The flush interval is too long for the workload To reset the counter after resolving the issue: ```sql SELECT pg_stat_ch_reset(); ``` ## DSA String Area Full Check the `dsa_oom_count` counter: ```sql SELECT dsa_oom_count FROM pg_stat_ch_stats(); ``` If it increases, the fixed ring entry was captured but the variable-length query or error text could not be stored. Increase `pg_stat_ch.string_area_size` and restart PostgreSQL. ## Missing I/O timing data If `shared_blk_read_time_us` and related columns are always zero: ```sql SHOW track_io_timing; ``` I/O timing must be enabled in `postgresql.conf`: ```ini track_io_timing = on ``` This adds a small overhead (one `gettimeofday()` call per block I/O operation) but provides valuable data for distinguishing CPU-bound from I/O-bound queries. ## Missing query_id values If `query_id` is always 0: ```sql SHOW compute_query_id; ``` Enable query ID computation: ```ini compute_query_id = on ``` This is required for grouping queries by their normalized form. Without it, every execution gets `query_id = 0` and cannot be aggregated. ## TLS connection errors If you see TLS-related errors in PostgreSQL logs: 1. Verify `pg_stat_ch.clickhouse_use_tls = on` is set 2. For self-signed certificates in testing, set `pg_stat_ch.clickhouse_skip_tls_verify = on` 3. In production, ensure the ClickHouse server certificate is trusted by the system CA store ## Extension health check query Run this in PostgreSQL for a quick health summary: ```sql SELECT pg_stat_ch_version() AS version, s.enqueued_events, s.exported_events, s.dropped_events, s.send_failures, round(s.queue_usage_pct, 1) AS queue_pct, s.last_error_text, CASE WHEN s.send_failures > 0 THEN 'EXPORT ERRORS' WHEN s.dropped_events > 0 THEN 'DROPS DETECTED' WHEN s.queue_usage_pct > 80 THEN 'QUEUE PRESSURE' WHEN s.enqueued_events = 0 THEN 'NO EVENTS' ELSE 'OK' END AS status FROM pg_stat_ch_stats() s; ```