# storage_engine — Benchmarks

Benchmark results for **storage_engine 1.0.4**, comparing four PostgreSQL table access methods on identical data:

| AM | Description |
|---|---|
| **heap** | Standard PostgreSQL heap (baseline) |
| **colcompress** | storage_engine column store, `lz4` compression, `orderby = 'event_date ASC'` |
| **rowcompress** | storage_engine row-level compressed store, `zstd` |
| **citus_columnar** | Citus columnar extension (`columnar` AM) |

---

## Test Environment

| | |
|---|---|
| **CPU** | AMD Ryzen 7 5800H — 8 cores / 16 threads |
| **RAM** | 40 GB DDR4 |
| **OS** | Ubuntu 24.04 LTS (x86_64) |
| **PostgreSQL** | 18.3 |
| **storage_engine** | 1.0.4 |
| **citus** | 14.0.0 |
| **shared_buffers** | 10 GB |
| **work_mem** | 256 MB |
| **Dataset** | 1 000 000 rows — heap 388 MB · colcompress 95 MB (lz4) · rowcompress 106 MB · citus_columnar 48 MB |

All queries run **3 times**; the **median** is reported. Results in milliseconds — lower is better.

---

## Serial — JIT off, no parallelism

Single-core storage baseline: isolates raw decompression and I/O cost per AM without interference from the parallel executor or LLVM JIT compilation.

![Serial benchmark](tests/bench/benchmark_serial.png)

| Query | heap | colcompress | rowcompress | citus_columnar |
|---|---:|---:|---:|---:|
| Q1 `COUNT(*)` | 38.6 ms | 43.7 ms | 305 ms | 36.9 ms |
| Q2 `SUM/AVG` numeric + double | 182.3 ms | **118.3 ms** | 356 ms | 121.4 ms |
| Q3 `GROUP BY` country (10 vals) | 214.4 ms | **162.3 ms** | 382 ms | 141.4 ms |
| Q4 `GROUP BY` event_type + p95 | 538.2 ms | **452.5 ms** | 680 ms | 469.9 ms |
| Q5 date range 1 month | 21.1 ms | **23.5 ms** | 60.0 ms | 21.1 ms |
| Q6 JSONB `@>` GIN | **121.7 ms** | 371.4 ms | 322 ms | 236.9 ms |
| Q7 JSONB key + GROUP BY | 386.2 ms | **309.0 ms** | 537 ms | 354.4 ms |
| Q8 array `@>` GIN | **61.3 ms** | 329.2 ms | 272 ms | 143.8 ms |
| Q9 LIKE text scan | 147.0 ms | **88.3 ms** | 333 ms | 90.3 ms |
| Q10 heavy multi-agg | **1908 ms** | 1902 ms | 2067 ms | 1914 ms |

### Highlights

- **Q5 (date range):** colcompress matches heap (23.5 ms vs 21.1 ms) because stripe pruning skips 6 of 7 stripes — data is physically sorted by `event_date` via the `orderby` option and globally compacted with `colcompress_merge`.
- **Q2, Q3, Q4, Q9:** colcompress wins through column projection — only the referenced columns are decompressed, reducing effective I/O versus the heap's full-row reads.
- **Q6, Q8 (GIN index queries):** heap wins — GIN index seeks return scattered TIDs that map to random stripe reads, negating columnar compression advantages.

---

## Parallel — JIT on, 16 parallel workers

Real-world simulation: all sessions on a multi-core server competing for CPU. JIT compilation is enabled and the PostgreSQL parallel executor dispatches up to 16 workers per query.

![Parallel benchmark](tests/bench/benchmark_parallel.png)

| Query | heap | colcompress | rowcompress | citus_columnar |
|---|---:|---:|---:|---:|
| Q1 `COUNT(*)` | 17.8 ms | **16.3 ms** | 144 ms | 37.0 ms |
| Q2 `SUM/AVG` numeric + double | 50.1 ms | **30.9 ms** | 142 ms | 121.7 ms |
| Q3 `GROUP BY` country (10 vals) | **57.6 ms** | 171 ms | 151 ms | 138 ms |
| Q4 `GROUP BY` event_type + p95 | 539 ms | **329 ms** | 686 ms | 473 ms |
| Q5 date range 1 month | **21.2 ms** | 242 ms | 69.5 ms | 21.0 ms |
| Q6 JSONB `@>` GIN | 84.5 ms | **42.8 ms** | 465 ms | 235 ms |
| Q7 JSONB key + GROUP BY | 391 ms | **87.7 ms** | 692 ms | 349 ms |
| Q8 array `@>` GIN | 61.7 ms | **33.3 ms** | 275 ms | 147 ms |
| Q9 LIKE text scan | 48.7 ms | **26.8 ms** | 140 ms | 91.0 ms |
| Q10 heavy multi-agg | 1951 ms | **691 ms** | 2085 ms | 1958 ms |

### Highlights

- **Q10 (heavy multi-agg):** colcompress achieves **691 ms vs 1951 ms heap** — a **×2.8 speedup** — through vectorized aggregate execution in each parallel worker. Each worker processes column chunks in SIMD-friendly batches of 10 000 values rather than one row at a time.
- **Q6, Q7, Q8 (JSONB / array):** colcompress wins vs heap in parallel mode thanks to column projection reducing the data each worker decompresses.
- **Q5 (date range) in parallel:** colcompress reads all stripes (242 ms) while heap stays at 21 ms. Each parallel worker receives an independent block range and scans it without the global stripe-pruning pass; stripe pruning only works in the sequential single-process path. For date-range workloads, run with parallelism disabled or rely on a GIN / B-tree index on a non-colcompress table.
- **Q3 (GROUP BY country):** heap wins in parallel (57.6 ms vs 171 ms colcompress) because the heap parallel path decompresses full rows at memory bandwidth speed while colcompress's per-column decompression adds per-worker overhead for low-column-count projections.

---

## Query Definitions

| Query | SQL pattern |
|---|---|
| Q1 | `SELECT COUNT(*) FROM t` |
| Q2 | `SELECT SUM(amount), AVG(amount), SUM(price), AVG(price) FROM t` |
| Q3 | `SELECT country_code, COUNT(*), AVG(score) FROM t GROUP BY country_code ORDER BY COUNT(*) DESC` |
| Q4 | `SELECT event_type, COUNT(*), SUM(amount), AVG(duration_ms), percentile_disc(0.95) WITHIN GROUP (ORDER BY duration_ms) FROM t GROUP BY event_type` |
| Q5 | `SELECT event_date, COUNT(*), SUM(amount), AVG(price) FROM t WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31' GROUP BY event_date ORDER BY event_date` |
| Q6 | `SELECT COUNT(*), AVG(amount) FROM t WHERE metadata @> '{"os":"android"}'` |
| Q7 | `SELECT metadata->>'campaign', COUNT(*), SUM(amount) FROM t WHERE metadata ? 'campaign' GROUP BY 1 ORDER BY 3 DESC` |
| Q8 | `SELECT COUNT(*), AVG(price) FROM t WHERE tags @> ARRAY['tag_5']` |
| Q9 | `SELECT COUNT(*), SUM(amount) FROM t WHERE url LIKE '/page/1%'` |
| Q10 | `SELECT browser, is_mobile, COUNT(*), SUM(amount), AVG(amount), MIN(amount), MAX(amount), SUM(price*quantity), AVG(duration_ms), COUNT(DISTINCT user_id), SUM(CASE WHEN event_type='purchase' THEN amount END) FROM t GROUP BY browser, is_mobile` |

---

## Reproducing

```bash
createdb bench_am
psql -d bench_am -f tests/bench/setup.sql

# Serial
bash tests/bench/run.sh 3
python3 tests/bench/chart.py        # → tests/bench/benchmark_serial.png

# Parallel
bash tests/bench/run_parallel.sh 3
python3 tests/bench/chart_parallel.py  # → tests/bench/benchmark_parallel.png
```

See [tests/README.md](tests/README.md) for the full environment description.