# Demo: Multi-Engine Leaderboard

*A live game leaderboard maintained once by pg_trickle, queryable from both PostgreSQL and DuckLake — operational and analytical, no duplication*

---

## What you'll build

You'll run a real-time game leaderboard that does two things simultaneously:

1. **Live leaderboard in Grafana** — player rankings refresh every 5 seconds,
   reading directly from PostgreSQL.
2. **Historical analytics in DuckDB** — every delta is written to MinIO as
   Parquet via the DuckLake sink. A JupyterLab notebook queries the ranking
   history over time.

Both views come from the **same two stream tables** maintained by pg_trickle.
There's no separate ETL job, no data duplication, and no consistency lag between
the operational and analytical views.

**Time to first leaderboard: under 5 minutes.**

---

## Background: Why is this interesting?

The classic problem in data engineering is keeping an OLTP system (fast writes,
low latency) and an analytics system (historical queries, large scans) in sync.
The usual solution is a CDC pipeline (Debezium) feeding a Kafka topic, consumed
by a stream processor, writing to a data lake, eventually queryable from a BI
tool. That's 4–6 systems.

This demo shows you can collapse all of that into PostgreSQL + pg_trickle:

- **PostgreSQL** is the OLTP source and the operational query layer.
- **pg_trickle stream tables** maintain the leaderboard incrementally using only
  the score events that arrived in the last 5 seconds.
- **DuckLake sink** appends each delta as a Parquet file to MinIO.
- **DuckDB** queries the Parquet files for time-travel analytics.

The `RANK()` window function inside the stream table query is particularly
noteworthy: pg_trickle's DVM engine handles differential refresh of ranked
results correctly, so the rank column stays accurate without a full recompute.

---

## Prerequisites

- **Docker Engine 24+** and **Docker Compose v2**
  Verify: `docker compose version`
- **4 GB free RAM**
- No local installation of pg_trickle or DuckDB needed — everything runs in
  containers

---

## First startup: Image build

**⚠️ The first `docker compose up` will build the pg_trickle image from source.** This takes 2–3 minutes and requires an internet connection (Rust toolchain download).

**Subsequent restarts will use Docker cache and start in seconds.**

Once built, the image is cached locally.

---

## Architecture

```
Python score generator
  │  50 score events/second
  │  columns: player_id, player_name, game_id, score
  ▼
PostgreSQL 18 + pg_trickle  (port 5432, db: postgres)
  │  game_scores table  ← trigger-based CDC
  │
  ├─ stream table: top_players
  │    SELECT player_id, player_name,
  │           SUM(score) AS total_score,
  │           RANK() OVER (ORDER BY SUM(score) DESC) AS rank
  │    5-second DIFFERENTIAL refresh
  │    DuckLake sink → MinIO: s3://pg-trickle-demo/top_players/
  │
  └─ stream table: scores_by_game
       SELECT game_id, SUM(score), COUNT(DISTINCT player_id)
       5-second DIFFERENTIAL refresh
       DuckLake sink → MinIO: s3://pg-trickle-demo/scores_by_game/
       ▼
  MinIO (port 9000/9001)    ← Parquet delta files, one per refresh cycle
       ▼
  DuckLake catalog          ← ducklake_snapshot, ducklake_view
       ▼
  ┌─────────────────────────────┬─────────────────────────────────┐
  │  Grafana (port 3000)        │  JupyterLab (port 8888)         │
  │  Live leaderboard           │  Historical ranking analysis     │
  │  Reads from PostgreSQL      │  Reads from MinIO via DuckDB     │
  └─────────────────────────────┴─────────────────────────────────┘
```

---

## Step 1: Start the demo

```bash
cd demos/ducklake-leaderboard
docker compose up
```

Five containers start:

| Container | Port | Purpose |
|-----------|------|---------|
| `postgres` | 5432 | PostgreSQL 18 + pg_trickle |
| `minio` | 9000 / 9001 | S3-compatible object storage |
| `minio-setup` | — | One-shot container that creates the `pg-trickle-demo` bucket |
| `generator` | — | Python script producing 50 score events/s |
| `grafana` | 3000 | Live leaderboard dashboard |

> **Note:** this demo does not include a JupyterLab container in the compose
> file. To run the analytics notebook, install DuckDB locally:
> `pip install duckdb jupyterlab` and open `notebooks/leaderboard_analysis.ipynb`.

Wait until all containers are healthy. You'll see:
```
postgres  | LOG: database system is ready to accept connections
```

---

## Step 2: Watch the live leaderboard in Grafana

Open **http://localhost:3000** in your browser.

- Login: `admin` / `admin`
- Navigate to **Dashboards → Multi-Engine Leaderboard**

The dashboard refreshes every 5 seconds. Watch player rankings change in
real time as the score generator inserts events.

---

## Step 3: Connect to PostgreSQL and explore

In a second terminal:

```bash
psql postgresql://postgres:postgres@localhost:5432/postgres
```

See the stream tables:

```sql
SELECT pgt_name, pgt_schema, refresh_mode, schedule, status, is_populated
FROM pgtrickle.pgt_stream_tables
ORDER BY pgt_name;
```

Query the current leaderboard:

```sql
SELECT rank, player_name, total_score, games_played
FROM top_players
ORDER BY rank
LIMIT 10;
```

Check that ranks are updating correctly after each refresh cycle:

```sql
-- Watch this query — rank 1 may change as scores accumulate
SELECT rank, player_name, total_score
FROM top_players
WHERE rank <= 3
ORDER BY rank;
```

See the provenance trail — every Parquet delta that was written to MinIO:

```sql
SELECT
    stream_table_name,
    ducklake_snapshot_id,
    delta_row_count,
    written_at
FROM pgtrickle.pgt_ducklake_provenance
ORDER BY written_at DESC
LIMIT 10;
```

---

## Step 4: Browse Parquet files in MinIO

Open the MinIO console at **http://localhost:9001**.

- Login: `minioadmin` / `minioadmin`
- Navigate to `pg-trickle-demo` → `top_players/`

A new `.parquet` file appears every 5 seconds (whenever there are score changes).
Each file is a delta — only the player rows whose total score changed in that
cycle.

---

## Step 5: Understand the stream table SQL

The leaderboard is defined by a single `create_stream_table` call
(pre-loaded by `postgres/init.sql`):

```sql
SELECT pgtrickle.create_stream_table(
    name => 'top_players',
    query => $$
        SELECT
            player_id,
            player_name,
            SUM(score)                             AS total_score,
            COUNT(1)                               AS games_played,
            RANK() OVER (ORDER BY SUM(score) DESC) AS rank
        FROM game_scores
        GROUP BY player_id, player_name
    $$,
    schedule                => '5s',
    refresh_mode            => 'DIFFERENTIAL',
    sink                    => 'ducklake',
    ducklake_sink_path      => 's3://pg-trickle-demo/top_players/',
    ducklake_sink_table_id  => (SELECT table_id FROM ducklake_table WHERE table_name = 'top_players')
);
```

**Why `RANK()` works in a stream table:**
`RANK()` is a window function applied over the aggregated result. pg_trickle's
DVM engine computes the `SUM(score) GROUP BY player_id` differentially (only
over new score events), then applies the `RANK()` over the full refreshed
result set. This means ranks are always globally consistent — no partial
recompute artifacts.

**Note on `COUNT(1)` vs `COUNT(*)`:**
When defining stream table queries with dollar-quoted strings (`$$ ... $$`),
PostgreSQL's parser has a quirk where bare `COUNT(*)` is rejected. Use
`COUNT(1)` (or any literal) instead — it's semantically identical and
pg_trickle's DVM engine will handle it correctly.

---

## Step 6: Connect from DuckDB (optional)

Query the stream tables and provenance trail directly from DuckDB without leaving
the Python REPL:

```python
import duckdb

con = duckdb.connect()
con.execute("INSTALL postgres; LOAD postgres;")

# Attach PostgreSQL (no ducklake: protocol — direct postgresql://)
con.execute("""
    ATTACH 'dbname=postgres user=postgres password=postgres host=localhost port=5432'
        AS lake (TYPE POSTGRES)
""")

# Query stream tables
print(con.execute("""
    SELECT pgt_name, pgt_schema, schedule, refresh_mode, status
    FROM lake.pgtrickle.pgt_stream_tables
    ORDER BY pgt_name
""").df())

# Check Parquet files written to MinIO
print(con.execute("""
    SELECT stream_table_name, delta_row_count, written_at
    FROM lake.pgtrickle.pgt_ducklake_provenance
    ORDER BY written_at DESC
    LIMIT 10
""").df())
```

DuckDB will automatically recognize the `pgtrickle` schema and the DuckLake
catalog tables. No separate notebook container is needed.

---

## Step 7: Run the analytics notebook (optional)

If you have DuckDB and JupyterLab installed locally:

```bash
pip install duckdb jupyterlab
jupyter lab notebooks/leaderboard_analysis.ipynb
```

In the notebook, attach to PostgreSQL (DuckDB will automatically recognize the
DuckLake catalog tables) and query the ranking history:

```python
import duckdb

con = duckdb.connect()

# Install and load the postgres extension
con.execute("INSTALL postgres; LOAD postgres;")

# Attach PostgreSQL directly (connection string goes in ATTACH)
con.execute("""
    ATTACH 'dbname=postgres user=postgres password=postgres host=localhost port=5432'
        AS lake (TYPE POSTGRES)
""")

# Query the stream tables via the pgtrickle schema
result = con.execute("""
    SELECT
        pgt_name,
        COUNT(*) as rows,
        MAX(updated_at) as last_updated
    FROM lake.pgtrickle.pgt_stream_tables
    GROUP BY pgt_name
""").df()
print(result)

# Query the provenance trail to see when Parquet files were written
provenance = con.execute("""
    SELECT * FROM lake.pgtrickle.pgt_ducklake_provenance
    ORDER BY written_at DESC
    LIMIT 20
""").df()
print(provenance)
```

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---------|-------------|-----|
| Postgres container exits with "syntax error at or near '*'" | `COUNT(*)` syntax in dollar-quoted stream table queries | Use `COUNT(1)` instead; PostgreSQL parser rejects bare `*` in this context |
| Grafana shows "No data" | Stream tables not yet populated | Wait 15 s; check `docker compose logs postgres` |
| MinIO console is empty | `minio-setup` container failed | Run `docker compose up minio-setup` to retry bucket creation |
| `pgt_ducklake_provenance` has no rows | S3 write failed or sink not configured | Check `docker compose logs postgres` for S3 errors; verify `sink => 'ducklake'` in stream table SQL |
| Rankings not changing | Score generator not running or not inserting rows | Check `docker compose logs generator` and `docker compose logs postgres` |
| Port 3000 conflict | Another service using port 3000 | Edit `docker-compose.yml` to change Grafana port from `3000:3000` to desired `PORT:3000` |
| DuckDB connection fails "Table does not exist" | Using old DuckLake attachment syntax | Use direct `postgresql://` connection in ATTACH, not `ducklake:` protocol

---

## What you've built

- A **live ranked leaderboard** that updates in under 5 seconds using only
  PostgreSQL — no Kafka, no stream processor, no separate OLAP database.
- **Bidirectional access**: the same data is queryable operationally from
  PostgreSQL (sub-millisecond reads) and analytically from DuckDB via the
  Parquet history on MinIO.
- **RANK() inside a stream table**: window functions work correctly under
  differential refresh — each cycle produces a fully consistent ranked result.

---

## Stop and clean up

```bash
docker compose down -v
```

---

## Related resources

- [Tutorial 3: The Modern Data Stack in One Box](../../docs/tutorial-modern-data-stack-one-box.md)
- [Tutorial 4: Streaming PostgreSQL to a Data Lake](../../docs/tutorial-streaming-postgres-to-data-lake.md)
- [Blog: Why pg_trickle + DuckLake Is the Missing Piece for Lakehouse IVM](../../blog/ducklake-ivm-missing-piece.md)


A self-contained `docker compose up` demo that runs a real-time game leaderboard
simultaneously maintained as a pg_trickle stream table in PostgreSQL **and**
published as a DuckLake table on MinIO.

The same data — maintained once by pg_trickle — is available to both operational
and analytical workloads without duplication.

---

## What It Does

```
Score generator (Python)
  │  50 score events/second
  ▼
PostgreSQL 18 + pg_trickle
  ├─ stream table: top_players   (5s DIFFERENTIAL refresh)
  │    └─ DuckLake sink → MinIO Parquet deltas
  ├─ stream table: scores_by_game (5s DIFFERENTIAL refresh)
  │    └─ DuckLake sink → MinIO Parquet deltas
  └─ DuckLake view registration: top_players, scores_by_game
  ▼
┌─────────────────────┬───────────────────────────────┐
│  Grafana (port 3000) │  DuckDB notebook (port 8888)  │
│  Live leaderboard    │  Historical analytics          │
│  (reads from PG)     │  (reads from DuckLake/MinIO)   │
└─────────────────────┴───────────────────────────────┘
```

---

## Quick Start

```bash
cd demos/ducklake-leaderboard
docker compose up
```

- **Live leaderboard**: http://localhost:3000 (Grafana, admin/admin)
- **Analytics notebook**: http://localhost:8888 (JupyterLab, token: `demo`)
- **MinIO console**: http://localhost:9001 (minioadmin/minioadmin)

---

## What to Watch

1. **Grafana**: The top-players panel refreshes every 5 seconds. Watch scores
   change in real time as the generator inserts events.

2. **MinIO console**: Navigate to the `pg-trickle-demo` bucket. New Parquet
   files appear under `top_players/` and `scores_by_game/` every 5 seconds.

3. **JupyterLab**: Open `leaderboard_analysis.ipynb` and run all cells. DuckDB
   queries the historical Parquet files and plots ranking trends over time.

---

## Services

| Service | Port | Purpose |
|---------|------|---------|
| PostgreSQL 18 + pg_trickle | 5432 | Stream tables, DuckLake catalog |
| MinIO | 9000/9001 | Object storage (S3-compatible) |
| Grafana | 3000 | Live leaderboard dashboard |
| JupyterLab | 8888 | Historical analytics (DuckDB) |
| Score generator | — | Python script producing 50 events/s |

---

## Key SQL

The leaderboard stream table:

```sql
SELECT pgtrickle.create_stream_table(
    'top_players',
    query => $$
        SELECT
            player_id,
            player_name,
            SUM(score) AS total_score,
            COUNT(*) AS games_played,
            RANK() OVER (ORDER BY SUM(score) DESC) AS rank
        FROM game_scores
        GROUP BY player_id, player_name
    $$,
    schedule           => '5s',
    refresh_mode       => 'DIFFERENTIAL',
    sink               => 'ducklake',
    ducklake_sink_path => 's3://pg-trickle-demo/top_players/'
);
```

After creation, pg_trickle automatically registers `top_players` as a native
view in the DuckLake catalog — so the JupyterLab notebook can query it from
DuckDB with a simple `SELECT * FROM lake.top_players`.

---

## Provenance trail

Query the lineage from PostgreSQL:

```sql
SELECT
    stream_table_name,
    ducklake_snapshot_id,
    delta_row_count,
    written_at
FROM pgtrickle.pgt_ducklake_provenance
ORDER BY written_at DESC
LIMIT 10;
```