--- title: Materialized views description: Pre-built ClickHouse materialized views for common analytics patterns --- The pg_stat_ch ClickHouse schema includes four materialized views that aggregate `events_raw` data automatically. All views are populated by ClickHouse as events arrive -- no manual refresh is needed. ## Overview | View | Purpose | Granularity | Retention | |---|---|---|---| | `events_recent_1h` | Real-time debugging | Per event | 1 hour TTL | | `query_stats_5m` | Query performance dashboards | 5-minute buckets | Unbounded | | `db_app_user_1m` | Load by application/user | 1-minute buckets | Unbounded | | `errors_recent` | Error investigation | Per error event | 7-day TTL | The typical workflow is: use materialized views to find patterns, then drill into `events_raw` for specific events. ## `events_recent_1h` A copy of `events_raw` with a 1-hour TTL. ClickHouse automatically deletes events older than 1 hour from this table. **Use cases:** - "What queries ran in the last 5 minutes?" - Real-time dashboards with sub-second refresh - Quick debugging of ongoing issues ```sql SELECT ts_start, db, cmd_type, duration_us / 1000 AS ms, rows, substring(query, 1, 100) AS query_preview FROM pg_stat_ch.events_recent_1h WHERE ts_start > now() - INTERVAL 5 MINUTE ORDER BY ts_start DESC LIMIT 50; ``` Because this table has a 1-hour TTL, queries against it are fast even if `events_raw` has weeks of data. ## `query_stats_5m` Pre-aggregated query statistics in 5-minute buckets. This is the primary view for dashboards and performance monitoring. **Use cases:** - QPS trends over time - Latency percentiles (p95/p99) per query - Identifying slow queries and regressions - Capacity planning ### Schema | Column | Type | Description | |---|---|---| | `bucket` | `DateTime` | 5-minute bucket start time | | `db` | `LowCardinality(String)` | Database name | | `query_id` | `Int64` | Normalized query identifier | | `cmd_type` | `LowCardinality(String)` | Command type | | `calls_state` | `AggregateFunction(count)` | Call count | | `duration_sum_state` | `AggregateFunction(sum, UInt64)` | Total duration (microseconds) | | `duration_min_state` | `AggregateFunction(min, UInt64)` | Minimum duration | | `duration_max_state` | `AggregateFunction(max, UInt64)` | Maximum duration | | `duration_q_state` | `AggregateFunction(quantilesTDigest(0.95, 0.99), UInt64)` | Latency percentile digest | | `rows_sum_state` | `AggregateFunction(sum, UInt64)` | Total rows | | `shared_hit_sum_state` | `AggregateFunction(sum, Int64)` | Total buffer hits | | `shared_read_sum_state` | `AggregateFunction(sum, Int64)` | Total buffer reads | ### Querying aggregate states This view uses ClickHouse's `-State` / `-Merge` pattern. Columns store intermediate aggregate states that must be finalized with the corresponding `-Merge` function: | To get | Use | |---|---| | Call count | `countMerge(calls_state)` | | Total duration | `sumMerge(duration_sum_state)` | | Min duration | `minMerge(duration_min_state)` | | Max duration | `maxMerge(duration_max_state)` | | p95 and p99 | `quantilesTDigestMerge(0.95, 0.99)(duration_q_state)` | | Total rows | `sumMerge(rows_sum_state)` | | Buffer hits | `sumMerge(shared_hit_sum_state)` | | Buffer reads | `sumMerge(shared_read_sum_state)` | This pattern allows correct re-aggregation across multiple 5-minute buckets. For example, a 1-hour p99 is computed correctly from 12 five-minute digests, rather than being an average of 12 p99 values. ### Top queries by p99 latency ```sql SELECT query_id, cmd_type, countMerge(calls_state) AS calls, round(sumMerge(duration_sum_state) / countMerge(calls_state) / 1000, 2) AS avg_ms, round(quantilesTDigestMerge(0.95, 0.99)(duration_q_state)[1] / 1000, 2) AS p95_ms, round(quantilesTDigestMerge(0.95, 0.99)(duration_q_state)[2] / 1000, 2) AS p99_ms FROM pg_stat_ch.query_stats_5m WHERE bucket >= now() - INTERVAL 1 HOUR GROUP BY query_id, cmd_type ORDER BY p99_ms DESC LIMIT 10; ``` ### QPS over time Each bucket is 5 minutes (300 seconds), so divide by 300 for per-second rate: ```sql SELECT bucket, countMerge(calls_state) / 300 AS qps FROM pg_stat_ch.query_stats_5m WHERE bucket >= now() - INTERVAL 24 HOUR GROUP BY bucket ORDER BY bucket; ``` ### Cache hit ratio trend ```sql SELECT bucket, round(100 * sumMerge(shared_hit_sum_state) / (sumMerge(shared_hit_sum_state) + sumMerge(shared_read_sum_state) + 1), 2) AS hit_ratio_pct FROM pg_stat_ch.query_stats_5m WHERE bucket >= now() - INTERVAL 24 HOUR GROUP BY bucket ORDER BY bucket; ``` ## `db_app_user_1m` Load breakdown by database, application, user, and command type in 1-minute buckets. **Use cases:** - Which application is generating the most load? - Per-tenant or per-user resource tracking - Error rates by application - Identifying misbehaving clients ### Schema | Column | Type | Description | |---|---|---| | `bucket` | `DateTime` | 1-minute bucket start time | | `db` | `LowCardinality(String)` | Database name | | `app` | `LowCardinality(String)` | Application name | | `username` | `LowCardinality(String)` | PostgreSQL username | | `cmd_type` | `LowCardinality(String)` | Command type | | `calls_state` | `AggregateFunction(count)` | Query count | | `duration_sum_state` | `AggregateFunction(sum, UInt64)` | Total duration (microseconds) | | `duration_q_state` | `AggregateFunction(quantilesTDigest(0.95, 0.99), UInt64)` | Latency percentile digest | | `errors_sum_state` | `AggregateFunction(sum, UInt64)` | Error count | ### Load by application ```sql SELECT app, countMerge(calls_state) AS total_queries, round(sumMerge(duration_sum_state) / 1000000, 2) AS total_seconds, round(quantilesTDigestMerge(0.95, 0.99)(duration_q_state)[2] / 1000, 2) AS p99_ms, sumMerge(errors_sum_state) AS errors FROM pg_stat_ch.db_app_user_1m WHERE bucket >= now() - INTERVAL 24 HOUR GROUP BY app ORDER BY total_seconds DESC; ``` ### Error rate by database and user ```sql SELECT db, username, countMerge(calls_state) AS queries, sumMerge(errors_sum_state) AS errors, round(100 * sumMerge(errors_sum_state) / countMerge(calls_state), 2) AS error_pct FROM pg_stat_ch.db_app_user_1m WHERE bucket >= now() - INTERVAL 1 HOUR GROUP BY db, username HAVING errors > 0 ORDER BY error_pct DESC; ``` ## `errors_recent` Recent error events with a 7-day TTL. This view filters `events_raw` to only rows where `err_elevel > 0`. **Use cases:** - "What errors happened in the last hour?" - Incident investigation with query context - Error rate alerting - Identifying recurring error patterns by SQLSTATE ### Schema This view stores the full event row (not aggregated): `ts_start`, `db`, `username`, `app`, `client_addr`, `pid`, `query_id`, `err_sqlstate`, `err_elevel`, `err_message`, `query` ### Recent errors with context ```sql SELECT ts_start, db, username, app, err_sqlstate, err_message, substring(query, 1, 200) AS query_preview FROM pg_stat_ch.errors_recent WHERE ts_start > now() - INTERVAL 1 HOUR ORDER BY ts_start DESC LIMIT 100; ``` ### Error breakdown by SQLSTATE ```sql SELECT err_sqlstate, count() AS occurrences, uniq(query_id) AS unique_queries, any(err_message) AS sample_message FROM pg_stat_ch.errors_recent WHERE ts_start > now() - INTERVAL 24 HOUR GROUP BY err_sqlstate ORDER BY occurrences DESC; ``` ## Custom views You can create your own materialized views on top of `events_raw` for project-specific analytics. For example, a view that tracks queries by table name or a view that computes per-minute error rates for alerting. See the [ClickHouse documentation on materialized views](https://clickhouse.com/docs/en/guides/developer/cascading-materialized-views) for details on creating and managing views.