[← Back to Blog Index](README.md)

# Incremental Aggregates in PostgreSQL: No ETL Required

## Running SUM, COUNT, AVG, and vector_avg over live tables without batch jobs

---

Every analytics system eventually needs pre-aggregated data. Raw tables get too big. Query latency becomes unacceptable. You start maintaining a separate summary table.

Then you need to keep it fresh. That's where things go wrong.

The standard playbook is to build an ETL pipeline: detect changes, compute new aggregates, write them back. This works. It also means you now own an ETL pipeline — a background process, a job scheduler, failure handling, backlog monitoring, and a latency SLA that's measured in minutes rather than seconds.

The question this post answers: do you actually need the ETL layer? For most SQL-expressible aggregates, the answer is no. pg_trickle can maintain running aggregates directly in PostgreSQL, incrementally, using the algebra of the aggregate functions themselves.

---

## Why Batch Aggregation Is the Default

Aggregate queries are inherently read-heavy. Computing `SUM(revenue)` over 100 million orders requires reading 100 million rows. Computing it over 1 billion rows requires reading 1 billion rows. The compute cost scales linearly with the data.

When you need that aggregate continuously — refreshed every minute, every 10 seconds — you hit a ceiling. At some table size, the scan takes longer than the interval. You can't run a 30-second aggregate every 10 seconds.

The usual response is to compute the aggregate less often (accept stale data), or to push the computation out of PostgreSQL into a streaming system better suited to continuous processing (accept operational complexity).

Both choices feel like giving up something real. Incremental aggregation avoids that tradeoff.

---

## The Algebraic Trick

The reason incremental aggregation is possible is that most aggregates have a well-defined delta function.

For `SUM(x)`:
- Insert a row with value `v`: delta = `+v`
- Delete a row with value `v`: delta = `-v`
- Update a row from `old_v` to `new_v`: delta = `+(new_v - old_v)`

For `COUNT(*)`:
- Insert: delta = `+1`
- Delete: delta = `-1`
- Update: delta = `0` (row count unchanged)

For `AVG(x)`:
- Maintain `running_sum` and `running_count` separately
- AVG = `running_sum / running_count`
- Update `running_sum` and `running_count` with the delta
- Recompute `AVG`

This is O(1) per change, regardless of table size. The key insight is that `SUM`, `COUNT`, and `AVG` are all expressible in terms of a running state that can be updated with each delta. Mathematicians call this a *semigroup* with an *inverse* — each operation has a reverse (subtraction for addition, decrement for increment) that lets you both add and remove elements from the aggregate.

This is the mathematics underlying pg_trickle's DVM (differential view maintenance) engine. The engine carries a ruleset of delta functions for each supported aggregate and applies them when changes arrive.

---

## Simple Aggregates: SUM, COUNT, AVG

```sql
-- Maintain real-time revenue metrics by region and day
SELECT pgtrickle.create_stream_table(
  name         => 'daily_revenue',
  query        => $$
    SELECT
      c.region,
      date_trunc('day', o.created_at) AS day,
      SUM(o.total)                    AS revenue,
      COUNT(*)                        AS order_count,
      AVG(o.total)                    AS avg_order_value,
      COUNT(DISTINCT o.customer_id)   AS unique_customers
    FROM orders o
    JOIN customers c ON c.id = o.customer_id
    GROUP BY c.region, date_trunc('day', o.created_at)
  $$,
  schedule     => '5 seconds',
  refresh_mode => 'DIFFERENTIAL'
);
```

When an order is inserted:
1. pg_trickle identifies the affected group `(region='europe', day='2026-04-27')`.
2. The DVM engine computes `delta_revenue = +order.total`, `delta_count = +1`, `new_avg = (old_sum + order.total) / (old_count + 1)`.
3. A single UPDATE is applied to `daily_revenue` for that one row.

An order placed at 14:32 is reflected in `daily_revenue` within 5 seconds. No batch job. No ETL. No Kafka.

Note the `COUNT(DISTINCT customer_id)` — this is a case where the delta is more complex. DISTINCT counts require tracking membership, not just a running tally. pg_trickle handles this with a set-based approach, but for very high-cardinality distinct counts, `HyperLogLog` approximation is often more practical and is on the roadmap.

---

## Conditional Aggregates

Real summary tables always have conditional logic — active vs. inactive, successful vs. failed, above/below threshold.

```sql
SELECT pgtrickle.create_stream_table(
  name         => 'support_dashboard',
  query        => $$
    SELECT
      team_id,
      COUNT(*)                                             AS total_tickets,
      COUNT(*) FILTER (WHERE status = 'open')             AS open_tickets,
      COUNT(*) FILTER (WHERE status = 'urgent')           AS urgent_tickets,
      COUNT(*) FILTER (WHERE status = 'resolved'
                       AND resolved_at > NOW() - INTERVAL '24 hours')
                                                          AS resolved_today,
      AVG(EXTRACT(EPOCH FROM (resolved_at - created_at)))
        FILTER (WHERE resolved_at IS NOT NULL)            AS avg_resolution_secs,
      MAX(created_at)                                     AS latest_ticket_at
    FROM tickets
    GROUP BY team_id
  $$,
  schedule     => '3 seconds',
  refresh_mode => 'DIFFERENTIAL'
);
```

The `FILTER (WHERE ...)` construct is a standard SQL conditional aggregate. pg_trickle handles it by carrying the condition into the delta computation. A ticket transitioning from `open` to `resolved` generates:
- `delta_open_tickets = -1`
- `delta_resolved_today = +1` (if within the 24-hour window)
- A recomputation of `avg_resolution_secs`

The time-window filter (`resolved_at > NOW() - INTERVAL '24 hours'`) introduces a subtlety: rows can age out of the window without any DML. This is handled by pg_trickle's time-window eviction logic, which runs during each refresh cycle to evict rows that have moved outside their window bounds.

---

## Multi-Table Aggregates With Joins

The place where batch ETL is hardest to replace is multi-table aggregation — computing aggregates over data that spans multiple source tables.

```sql
-- Per-author engagement metrics: articles + reactions + comments
SELECT pgtrickle.create_stream_table(
  name         => 'author_engagement',
  query        => $$
    SELECT
      u.id            AS author_id,
      u.display_name,
      COUNT(DISTINCT a.id)                        AS article_count,
      SUM(r.count)                                AS total_reactions,
      COUNT(DISTINCT c.id)                        AS total_comments,
      ROUND(SUM(r.count)::numeric /
        NULLIF(COUNT(DISTINCT a.id), 0), 2)       AS avg_reactions_per_article
    FROM users u
    LEFT JOIN articles a ON a.author_id = u.id
      AND a.published = true
    LEFT JOIN article_reactions r ON r.article_id = a.id
    LEFT JOIN comments c ON c.article_id = a.id
    GROUP BY u.id, u.display_name
  $$,
  schedule     => '10 seconds',
  refresh_mode => 'DIFFERENTIAL'
);
```

When a reaction is added to an article by author 42:
1. The CDC trigger on `article_reactions` fires, recording the INSERT.
2. The DVM engine evaluates the join delta: this reaction belongs to an article by author 42.
3. The engine updates the `author_id = 42` row in `author_engagement`: `total_reactions += 1`, recompute `avg_reactions_per_article`.

No other rows are touched. Author 43's metrics are unaffected.

This is the join-propagation problem that makes hand-rolled incremental ETL so fragile: the change in `article_reactions` affects a row in the aggregate keyed by `author_id`, through an intermediate join on `articles`. The DVM engine handles this join-delta propagation automatically.

---

## Vector Aggregates: vector_avg

Here's where things get interesting for ML workloads.

The same algebraic principle that applies to `AVG(price)` applies to `AVG(embedding)` — a vector average is just an element-wise sum divided by a count. It's the same delta function applied to each dimension.

Arriving in v0.37:

```sql
-- Per-user taste vector: average embedding of liked items
SELECT pgtrickle.create_stream_table(
  name         => 'user_taste',
  query        => $$
    SELECT
      ul.user_id,
      vector_avg(i.embedding)   AS taste_vec,
      COUNT(*)                  AS like_count,
      MAX(ul.liked_at)          AS last_interaction
    FROM user_likes ul
    JOIN items i ON i.id = ul.item_id
    GROUP BY ul.user_id
  $$,
  schedule     => '5 seconds',
  refresh_mode => 'DIFFERENTIAL'
);

CREATE INDEX ON user_taste USING hnsw (taste_vec vector_cosine_ops);
```

When user 42 likes item 1701:
1. CDC captures the INSERT on `user_likes`.
2. The DVM engine retrieves `item 1701's embedding`.
3. Delta: `new_sum_vec = old_sum_vec + item_1701.embedding`, `new_count = old_count + 1`, `new_taste_vec = new_sum_vec / new_count`.
4. One UPDATE to `user_taste` for `user_id = 42`.

One million users, thousands of likes per second. Each refresh cycle touches only the users whose likes changed in that cycle. The HNSW index on `taste_vec` receives targeted updates, not a full rebuild.

This replaces the "recompute all user taste vectors every night" batch job with a continuously maintained table. Users get personalization that reflects their most recent action, not last night's.

---

## Percentile and Rank Aggregates: The Hard Cases

Not every aggregate has a clean delta function. Some require knowing the full distribution.

**`PERCENTILE_CONT` / `PERCENTILE_DISC`**: Median and other percentiles require sorted access to the full set of values. There's no O(1) way to update a percentile when a single value changes. These always require a full recomputation.

**`RANK()` / `DENSE_RANK()`**: Inserting one row can change the rank of every other row. This is an O(n) update in the worst case.

**`NTILE()`**: Same problem as rank.

For these, pg_trickle falls back to `refresh_mode => 'FULL'` — which is still useful if you want scheduled maintenance, monitoring, and the other benefits of stream tables, just without the differential speedup.

```sql
SELECT pgtrickle.create_stream_table(
  name         => 'customer_revenue_ranks',
  query        => $$
    SELECT
      customer_id,
      total_revenue,
      RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rank
    FROM customer_totals
  $$,
  schedule     => '1 minute',
  refresh_mode => 'FULL'   -- RANK() requires full recompute
);
```

This is still better than an unmanaged materialized view: the refresh is scheduled, monitored, and the table's freshness is visible via `pgtrickle.stream_table_status()`.

---

## Monitoring What You've Built

Once stream tables exist, pg_trickle exposes their state through a monitoring view:

```sql
SELECT
  name,
  refresh_mode,
  last_refresh_at,
  EXTRACT(EPOCH FROM (NOW() - last_refresh_at))::int AS staleness_secs,
  rows_changed_last_cycle,
  avg_refresh_ms,
  schedule
FROM pgtrickle.stream_table_status()
ORDER BY staleness_secs DESC;
```

This is what replaces the ETL job monitoring dashboard. You're not watching a Celery worker queue or a Kafka consumer lag counter — you're watching a PostgreSQL view that reports directly from the extension's catalog. One `SELECT` tells you everything.

---

## The Tradeoff

Incremental aggregation inside PostgreSQL is not free. The DVM engine adds overhead to the source write path — CDC triggers add roughly 50–200 microseconds per modified row, depending on how many stream tables reference that table.

For high-write workloads (thousands of writes per second), this overhead is worth measuring. pg_trickle's backpressure mechanism can be configured to shed load by falling back to full refresh when the write rate exceeds a configurable threshold.

But for the vast majority of OLTP workloads — hundreds to low thousands of writes per second — the overhead is negligible, and the elimination of the external ETL layer more than compensates.

The ETL pipeline that takes 40 minutes of engineering per incident and lives in a separate repository, separate monitoring system, and separate deployment pipeline? That's gone. The aggregate data is where it should have been all along: in the database, maintained by the database, queryable with SQL.

---

*pg_trickle is an open-source PostgreSQL extension for incremental view maintenance. Source and documentation at [github.com/grove/pg-trickle](https://github.com/grove/pg-trickle).*