[← Back to Blog Index](https://trickle-labs.github.io/pg-trickle/blog/) | [Documentation](https://trickle-labs.github.io/pg-trickle/)

# Monitoring Your DuckLake with pg_trickle

## DuckLake's 28 metadata tables are rich with operational signals. Build live alerts over them with stream tables.

---

Every DuckLake deployment comes with an operational intelligence layer built in:
the ~28 metadata tables that DuckLake maintains in PostgreSQL. These tables
record every snapshot, every data file, every schema change, every compaction
event, and every write operation. Together they are a complete operational audit
log of your data lake.

The problem is that these tables are not optimised for monitoring. Counting
"how many small files appeared in the last hour" requires a join across
`ducklake_data_file`, `ducklake_table_changes`, and `ducklake_snapshot` with
aggregate functions — a query you wouldn't want running on every Grafana
request.

pg_trickle solves this by computing those aggregations incrementally and
materialising the results. The monitoring queries become instant reads from a
pre-computed stream table instead of expensive multi-table scans.

---

## DuckLake's Key Metadata Tables

| Table | What it records |
|-------|----------------|
| `ducklake_snapshot` | Each committed transaction — the fundamental unit of DuckLake history |
| `ducklake_table_changes` | Which tables changed in each snapshot |
| `ducklake_data_file` | All Parquet data files: path, size, row count, min/max stats |
| `ducklake_deleted_file` | Files removed by compaction or deletion |
| `ducklake_schema` | Schema versions |
| `ducklake_table` | Table definitions |
| `ducklake_column_stats` | Per-column statistics for query planning |
| `ducklake_tag` | Named snapshot tags (time-travel anchors) |

The monitoring scenarios below use these tables as stream table sources.

---

## Scenario 1: Small-File Proliferation Alert

Parquet files under ~128 MB are typically a compaction problem: too many small
writes created too many small files, and file-open overhead will slow down
future scans. Track the small-file count per table and alert when it exceeds
a threshold.

```sql
-- Count Parquet files under 10 MB per table, updated every minute
SELECT pgtrickle.create_stream_table(
    name         => 'ducklake_small_file_counts',
    query        => $$
        SELECT
            dt.schema_name,
            dt.table_name,
            COUNT(*)                    AS small_file_count,
            SUM(df.file_size_bytes)     AS total_small_file_bytes,
            MAX(df.created_snapshot_id) AS latest_snapshot_id
        FROM ducklake_data_file df
        JOIN ducklake_table dt ON dt.table_id = df.table_id
        WHERE df.file_size_bytes < 10 * 1024 * 1024   -- < 10 MB
          AND df.deleted_snapshot_id IS NULL            -- still active
        GROUP BY dt.schema_name, dt.table_name
    $$,
    schedule     => '1m',
    refresh_mode => 'DIFFERENTIAL'
);
```

In Grafana, create an alert rule:

```sql
SELECT
    schema_name || '.' || table_name AS table_name,
    small_file_count
FROM ducklake_small_file_counts
WHERE small_file_count > 50
ORDER BY small_file_count DESC;
```

When any table accumulates more than 50 small files, Grafana fires an alert
telling you to run `CALL ducklake_compact('schema_name.table_name')` in DuckDB.

---

## Scenario 2: Snapshot Rate Monitoring

A sudden spike in snapshot creation rate usually indicates a runaway write loop,
a misconfigured ETL job, or a schema-migration script that created a new
snapshot per row. This stream table shows commits per minute per table.

```sql
-- Snapshot creation rate: commits per minute and per table
SELECT pgtrickle.create_stream_table(
    name         => 'ducklake_snapshot_rate',
    query        => $$
        SELECT
            date_trunc('minute', s.created_at)  AS minute,
            dt.schema_name,
            dt.table_name,
            COUNT(DISTINCT s.snapshot_id)        AS snapshots_in_minute,
            SUM(tc.rows_added)                   AS rows_added,
            SUM(tc.rows_deleted)                 AS rows_deleted
        FROM ducklake_snapshot s
        JOIN ducklake_table_changes tc ON tc.snapshot_id = s.snapshot_id
        JOIN ducklake_table dt ON dt.table_id = tc.table_id
        GROUP BY
            date_trunc('minute', s.created_at),
            dt.schema_name,
            dt.table_name
    $$,
    schedule     => '30s',
    refresh_mode => 'DIFFERENTIAL'
);
```

Grafana time-series panel:

```sql
SELECT
    $__timeGroupAlias(minute, '1m'),
    schema_name || '.' || table_name AS table_label,
    snapshots_in_minute AS "Snapshots / min"
FROM ducklake_snapshot_rate
WHERE $__timeFilter(minute)
ORDER BY 1, 2;
```

---

## Scenario 3: Storage Growth Curves

Track cumulative storage per table over time. Useful for capacity planning and
for identifying tables that are growing faster than expected.

```sql
-- Cumulative active storage per table, sampled every 5 minutes
SELECT pgtrickle.create_stream_table(
    name         => 'ducklake_storage_growth',
    query        => $$
        SELECT
            dt.schema_name,
            dt.table_name,
            SUM(df.file_size_bytes)  AS active_bytes,
            COUNT(*)                 AS active_file_count,
            SUM(df.row_count)        AS active_row_count,
            now()                    AS sampled_at
        FROM ducklake_data_file df
        JOIN ducklake_table dt ON dt.table_id = df.table_id
        WHERE df.deleted_snapshot_id IS NULL   -- active files only
        GROUP BY dt.schema_name, dt.table_name
    $$,
    schedule     => '5m',
    refresh_mode => 'DIFFERENTIAL'
);
```

---

## Scenario 4: Per-Tenant Write Activity (Multi-Tenant Deployments)

For multi-tenant DuckLake deployments where each tenant writes to a separate
schema, track write volume per tenant to identify heavy writers, enforce
quotas, and bill accurately.

```sql
-- Tenant write activity: rows written per tenant per hour
SELECT pgtrickle.create_stream_table(
    name         => 'ducklake_tenant_write_activity',
    query        => $$
        SELECT
            dt.schema_name                          AS tenant_schema,
            date_trunc('hour', s.created_at)        AS hour,
            SUM(tc.rows_added)                      AS rows_added,
            SUM(tc.rows_deleted)                    AS rows_deleted,
            COUNT(DISTINCT s.snapshot_id)            AS commit_count,
            SUM(
                COALESCE(df.file_size_bytes, 0)
            ) AS bytes_written
        FROM ducklake_snapshot s
        JOIN ducklake_table_changes tc  ON tc.snapshot_id = s.snapshot_id
        JOIN ducklake_table dt          ON dt.table_id = tc.table_id
        LEFT JOIN ducklake_data_file df ON df.created_snapshot_id = s.snapshot_id
                                       AND df.table_id = tc.table_id
        GROUP BY
            dt.schema_name,
            date_trunc('hour', s.created_at)
    $$,
    schedule     => '5m',
    refresh_mode => 'DIFFERENTIAL'
);
```

---

## Scenario 5: Compaction Efficiency Tracking

After each compaction run, track how much space was reclaimed and how many
small files were merged.

```sql
-- Compaction events: files created vs. deleted per snapshot
SELECT pgtrickle.create_stream_table(
    name         => 'ducklake_compaction_events',
    query        => $$
        WITH deleted_in_snapshot AS (
            SELECT
                df.deleted_snapshot_id AS snapshot_id,
                dt.schema_name,
                dt.table_name,
                COUNT(*)               AS files_compacted,
                SUM(df.file_size_bytes) AS bytes_compacted
            FROM ducklake_data_file df
            JOIN ducklake_table dt ON dt.table_id = df.table_id
            WHERE df.deleted_snapshot_id IS NOT NULL
            GROUP BY df.deleted_snapshot_id, dt.schema_name, dt.table_name
        ),
        created_in_snapshot AS (
            SELECT
                df.created_snapshot_id AS snapshot_id,
                COUNT(*)               AS files_created
            FROM ducklake_data_file df
            JOIN ducklake_table dt ON dt.table_id = df.table_id
            WHERE df.created_snapshot_id IS NOT NULL
            GROUP BY df.created_snapshot_id
        )
        SELECT
            d.schema_name,
            d.table_name,
            s.created_at,
            d.files_compacted,
            c.files_created,
            d.bytes_compacted
        FROM deleted_in_snapshot d
        JOIN ducklake_snapshot s ON s.snapshot_id = d.snapshot_id
        LEFT JOIN created_in_snapshot c ON c.snapshot_id = d.snapshot_id
        WHERE d.files_compacted > c.files_created   -- net reduction = compaction
    $$,
    schedule     => '5m',
    refresh_mode => 'DIFFERENTIAL'
);
```

---

## The Grafana Dashboard

Import the pre-built dashboard JSON from
[demos/ducklake-observability/grafana/](https://github.com/trickle-labs/pg-trickle/tree/main/demos/ducklake-observability/grafana/)
or create panels manually:

| Panel | Query | Alert |
|-------|-------|-------|
| Small files per table | `ducklake_small_file_counts` | > 50 files → WARN, > 200 → CRIT |
| Snapshot rate | `ducklake_snapshot_rate` | > 60/min for any table → WARN |
| Storage growth | `ducklake_storage_growth` | Projected > quota in 7 days → WARN |
| Tenant write volume | `ducklake_tenant_write_activity` | None (billing only) |
| Compaction events | `ducklake_compaction_events` | None (informational) |

---

## Initialisation Script

For a quick start, use the pre-packaged initialisation script from the
[DuckLake Observability in a Box demo](https://github.com/trickle-labs/pg-trickle/tree/main/demos/ducklake-observability/):

```bash
git clone https://github.com/trickle-labs/pg-trickle
cd pg-trickle/demos/ducklake-observability

# Edit .env with your PostgreSQL connection details
cp .env.example .env
$EDITOR .env

# Start the monitoring stack (PostgreSQL must already have pg_trickle + DuckLake)
docker compose up -d grafana

# Initialise the stream tables
psql "$DATABASE_URL" -f init_monitoring.sql
```

Five minutes from `git clone` to a production-quality Grafana observability
dashboard over your DuckLake deployment.

---

## Performance Notes

DuckLake's metadata tables are small relative to the data they describe. A
DuckLake deployment with 10,000 snapshots and 100,000 data files will have
metadata tables totalling under 50 MB. pg_trickle's differential refresh
processes only new rows since the last tick, so the per-cycle cost is
proportional to the write rate, not the catalog size.

For extremely active DuckLake deployments (thousands of commits per minute),
set the stream table schedule to `'5m'` or longer and consider adding a partial
index on `ducklake_snapshot.created_at` to speed up the time-range filters.

---

## Summary

DuckLake's metadata tables are a ready-made operational intelligence layer.
pg_trickle turns them into a live monitoring system in minutes. The combination
gives every DuckLake operator:

- **Real-time alerts** for compaction, snapshot rate spikes, and storage anomalies
- **Historical trend charts** for capacity planning
- **Per-tenant write metrics** for billing and quota enforcement
- **Zero external infrastructure** — everything runs inside the PostgreSQL instance
  that already hosts the DuckLake catalog

---

*See also:*
- [Demo: DuckLake Observability in a Box](../demos/ducklake-observability/README.md)
- [Tutorial: Real-Time Dashboards on Your Data Lake](ducklake-real-time-dashboards.md)
- [Tutorial: The Modern Data Stack in One Box](ducklake-modern-data-stack.md)
- [Blog: Why pg_trickle + DuckLake Is the Missing Piece for Lakehouse IVM](ducklake-ivm-missing-piece.md)