# What Happens When You TRUNCATE a Table?

This tutorial explains what happens when a `TRUNCATE` statement hits a base table that is referenced by a stream table. Unlike INSERT, UPDATE, and DELETE — which are fully tracked by the CDC trigger — TRUNCATE is a special case that **bypasses row-level triggers entirely**. Understanding this gap is essential for operating pg_trickle correctly.

> **Prerequisite:** Read [WHAT_HAPPENS_ON_INSERT.md](WHAT_HAPPENS_ON_INSERT.md) first — it introduces the 7-phase lifecycle. This tutorial explains why TRUNCATE breaks that lifecycle and how to recover.

## Setup

Same e-commerce example used throughout the series:

```sql
CREATE TABLE orders (
    id       SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

SELECT pgtrickle.create_stream_table(
    name     => 'customer_totals',
    query    => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule => '1m'
);

-- Seed some data
INSERT INTO orders (customer, amount) VALUES
    ('alice', 50.00),
    ('alice', 30.00),
    ('bob',   75.00),
    ('bob',   25.00);
```

After the first refresh, the stream table contains:

```
customer | total  | order_count
---------|--------|------------
alice    | 80.00  | 2
bob      | 100.00 | 2
```

---

## Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)

```sql
TRUNCATE orders;
```

All four rows are removed instantly.

### What Happens at the Trigger Level: TRUNCATE Marker

> **Updated in v0.2.0:** pg_trickle now installs a **statement-level AFTER TRUNCATE** trigger on tracked source tables. This trigger writes a single marker row to the change buffer with `action = 'T'`.

Unlike the per-row DML triggers, the TRUNCATE trigger cannot capture individual row data (PostgreSQL's `TRUNCATE` does not provide `OLD` records). Instead, it writes a sentinel:

```
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┐
│ change_id │ lsn         │ action │ new_*    │ old_*    │
├───────────┼─────────────┼────────┼──────────┼──────────┤
│ 5         │ 0/1A3F4000  │ T      │ NULL     │ NULL     │
└───────────┴─────────────┴────────┴──────────┴──────────┘
```

The `'T'` action marker tells the refresh engine: "a TRUNCATE happened — a full refresh is required."

### What Happens at the Scheduler: Automatic Full Refresh

On the next refresh cycle, the scheduler:
1. Checks the change buffer for rows in the LSN window
2. Finds the `action = 'T'` marker row
3. **Falls back to a FULL refresh** — regardless of the stream table's configured `refresh_mode`
4. TRUNCATEs the stream table
5. Re-executes the defining query against the current base table state
6. Inserts all results

Since the `orders` table is now empty, the defining query returns zero rows:

```sql
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
 customer | total | order_count
----------|-------|------------
 (0 rows)                        ← correct: orders is empty
```

**No manual intervention required.** The TRUNCATE marker ensures the stream table is automatically brought back into consistency on the next refresh cycle.

> **Note:** In versions before v0.2.0, TRUNCATE was not captured at all — the change buffer stayed empty and the stream table became silently stale. If you're running an older version, you still need to call `pgtrickle.refresh_stream_table()` manually after a TRUNCATE.

---

## Case 2: Manual Refresh (Explicit Recovery)

Although TRUNCATE is now automatically handled on the next refresh cycle, you can force an immediate recovery without waiting:

```sql
SELECT pgtrickle.refresh_stream_table('customer_totals');
```

This executes a **full refresh** regardless of the stream table's configured refresh mode:

1. **TRUNCATE** the stream table itself (clearing the stale data)
2. **Re-execute** the defining query
3. **INSERT** the results into the stream table
4. **Update the frontier** so future differential refreshes start from the current LSN

This is useful when you can't wait for the next scheduled refresh cycle and need the stream table consistent immediately.

---

## Case 3: TRUNCATE Then INSERT (Common ETL Pattern)

A common data loading pattern is:

```sql
BEGIN;
TRUNCATE orders;
INSERT INTO orders (customer, amount) VALUES
    ('charlie', 100.00),
    ('charlie', 200.00),
    ('dave',    150.00);
COMMIT;
```

### What the Change Buffer Sees

- TRUNCATE: **1 marker event** (`action = 'T'`) — captured by the statement-level trigger
- INSERT charlie 100.00: **1 event** (captured)
- INSERT charlie 200.00: **1 event** (captured)
- INSERT dave 150.00: **1 event** (captured)

The change buffer has 4 rows — the TRUNCATE marker plus 3 INSERT events.

### What the Scheduler Does

The scheduler sees the `action = 'T'` marker and triggers a **full refresh**, ignoring the individual INSERT events. The full refresh re-executes the defining query against the current state of `orders`, which now contains only charlie and dave:

```sql
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 charlie  | 300.00 | 2            ← correct
 dave     | 150.00 | 1            ← correct
```

The old data (alice, bob) is gone because the full refresh recomputed from scratch. This is correct — the TRUNCATE marker ensures consistency regardless of what other changes occurred in the same window.

---

## Case 4: TRUNCATE a Dimension Table in a JOIN

Consider a stream table that joins two tables:

```sql
CREATE TABLE customers (
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT NOT NULL DEFAULT 'standard'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    amount      NUMERIC(10,2)
);

SELECT pgtrickle.create_stream_table(
    name         => 'order_details',
    query        => $$
      SELECT c.name, c.tier, o.amount
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
    $$,
    schedule => '1m'
);
```

Now truncate the dimension table:

```sql
TRUNCATE customers CASCADE;
```

The `CASCADE` also truncates `orders` (due to the foreign key). Both tables have TRUNCATE triggers installed, so both write a `'T'` marker to their respective change buffers.

On the next refresh cycle, the scheduler detects the TRUNCATE markers and performs a full refresh. The stream table is recomputed from the now-empty base tables:

```sql
-- After the next scheduled refresh:
SELECT * FROM order_details;
-- (0 rows) — correct
```

---

## Case 5: FULL Mode Stream Tables Are Immune

If the stream table uses `FULL` refresh mode instead of `DIFFERENTIAL`:

```sql
SELECT pgtrickle.create_stream_table(
    name         => 'customer_totals_full',
    query        => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    schedule     => '1m',
    refresh_mode => 'FULL'
);
```

A FULL-mode stream table doesn't use the change buffer at all. Every refresh cycle:
1. TRUNCATEs the stream table
2. Re-executes the defining query
3. Inserts all results

So after a TRUNCATE of the base table, the **next scheduled refresh** automatically picks up the correct state — no manual intervention needed. The trade-off is that every refresh recomputes from scratch, which is more expensive for large result sets.

---

## Why PostgreSQL Doesn't Fire Row Triggers on TRUNCATE

Understanding the PostgreSQL internals helps explain why per-row capture is impossible:

| Operation | Mechanism | Row triggers fired? | Statement triggers fired? |
|-----------|-----------|-------------------|-------------------------|
| `DELETE FROM t` | Scans and removes rows one by one | Yes — AFTER DELETE per row | Yes |
| `TRUNCATE t` | Removes all heap files and reinitializes the table storage | No — no per-row processing | Yes — AFTER TRUNCATE |
| `DELETE FROM t WHERE true` | Same as `DELETE FROM t` (full scan) | Yes — AFTER DELETE per row | Yes |

`TRUNCATE` is fundamentally different from `DELETE`. It's an O(1) operation that replaces the table's storage files, while DELETE is O(N) — scanning every row and recording each removal in WAL.

pg_trickle uses a **statement-level** `AFTER TRUNCATE` trigger to detect the event and write a `'T'` marker to the change buffer. This marker does not contain per-row data (PostgreSQL's TRUNCATE trigger doesn't provide OLD records), but it's sufficient to signal that a full refresh is needed.

---

## Alternative: DELETE FROM Instead of TRUNCATE

For **DIFFERENTIAL** mode, `TRUNCATE` is now handled automatically (via the `'T'` marker and full refresh fallback). However, using `DELETE FROM` instead of `TRUNCATE` has its own advantages:

```sql
-- Instead of: TRUNCATE orders;
DELETE FROM orders;
```

This fires the row-level DELETE trigger for every row. The change buffer captures all removals, and the next differential refresh correctly decrements all reference counts through the standard algebraic delta path — avoiding the need for a full refresh fallback.

| Approach | Speed | Stream table consistent? | Refresh type |
|----------|-------|------------------------|-------------|
| `TRUNCATE orders` | O(1) — instant | Yes — automatic full refresh on next cycle | FULL (fallback) |
| `DELETE FROM orders` | O(N) — scans all rows | Yes — per-row triggers fire | DIFFERENTIAL |
| `TRUNCATE` + manual refresh | O(1) + O(query) | Yes — immediately | FULL (manual) |

For tables with millions of rows, `DELETE FROM` can be slow and generate significant WAL. TRUNCATE is generally the better choice — the automatic full refresh fallback makes it safe to use.

---

## Best Practices

### 1. TRUNCATE Is Safe to Use

As of v0.2.0, TRUNCATE on tracked source tables is automatically detected and triggers a full refresh on the next scheduler cycle. No manual intervention is required for standard operation.

### 2. Use Manual Refresh for Immediate Consistency

If you need the stream table to be consistent immediately (not on the next cycle), call refresh explicitly:

```sql
TRUNCATE orders;
SELECT pgtrickle.refresh_stream_table('customer_totals');
```

### 3. Consider IMMEDIATE Mode for Real-Time Needs

For stream tables that need to reflect TRUNCATE instantly (within the same transaction), use IMMEDIATE mode. The TRUNCATE trigger automatically performs a full refresh synchronously.

### 4. Consider FULL Mode for ETL-Heavy Tables

If a table is routinely truncated and reloaded, FULL refresh mode may be simpler than DIFFERENTIAL — it naturally handles TRUNCATE because it recomputes from scratch every cycle.

### 5. Use `trigger_inventory()` to Verify Triggers

You can verify that both the DML trigger and the TRUNCATE trigger are installed and enabled:

```sql
SELECT * FROM pgtrickle.trigger_inventory();
```

This shows one row per (source table, trigger type) confirming both `pg_trickle_cdc_<oid>` (DML) and `pg_trickle_cdc_truncate_<oid>` (TRUNCATE) triggers are present.

---

## How TRUNCATE Compares to Other Operations

| Aspect | INSERT | UPDATE | DELETE | TRUNCATE |
|--------|--------|--------|--------|----------|
| **Row trigger fires?** | Yes (per row) | Yes (per row) | Yes (per row) | No |
| **Statement trigger fires?** | Yes | Yes | Yes | Yes (writes `'T'` marker) |
| **Change buffer** | 1 row per INSERT | 1 row per UPDATE | 1 row per DELETE | 1 marker row (`action='T'`) |
| **Stream table updated?** | Yes (next refresh) | Yes (next refresh) | Yes (next refresh) | Yes (full refresh on next cycle) |
| **Recovery** | Automatic (differential) | Automatic (differential) | Automatic (differential) | Automatic (full refresh fallback) |
| **FULL mode affected?** | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) |
| **IMMEDIATE mode?** | Synchronous delta | Synchronous delta | Synchronous delta | Synchronous full refresh |
| **Speed** | O(1) per row | O(1) per row | O(1) per row | O(1) + O(query) for refresh |

---

## What About IMMEDIATE Mode?

In **IMMEDIATE** mode, TRUNCATE is handled synchronously within the same transaction:

1. The **BEFORE TRUNCATE** trigger acquires an advisory lock on the stream table
2. The **AFTER TRUNCATE** trigger calls `pgt_ivm_handle_truncate(pgt_id)`
3. This function TRUNCATEs the stream table and re-populates it by re-executing the defining query
4. The stream table is immediately consistent — within the same transaction

```sql
SELECT pgtrickle.create_stream_table(
    name         => 'customer_totals_live',
    query        => $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    refresh_mode => 'IMMEDIATE'
);

BEGIN;
TRUNCATE orders;
-- customer_totals_live is already empty here!
SELECT * FROM customer_totals_live;  -- (0 rows)
COMMIT;
```

No waiting for a scheduler cycle, no stale data — TRUNCATE is fully handled in real-time.

---

## Summary

As of v0.2.0, TRUNCATE is fully tracked by pg_trickle across all three refresh modes. While it cannot be captured as per-row DELETE events (PostgreSQL's TRUNCATE doesn't process individual rows), pg_trickle uses a statement-level trigger to detect the event and respond appropriately.

The key takeaways:

1. **TRUNCATE is automatically handled** — a statement-level AFTER TRUNCATE trigger writes a `'T'` marker to the change buffer
2. **DIFFERENTIAL mode: automatic full refresh** — the scheduler detects the marker and falls back to a full refresh on the next cycle
3. **IMMEDIATE mode: synchronous full refresh** — the stream table is rebuilt within the same transaction
4. **FULL mode: naturally immune** — every refresh recomputes from scratch regardless
5. **Manual refresh for instant consistency** — call `pgtrickle.refresh_stream_table()` if you can't wait for the next cycle
6. **`DELETE FROM` remains an alternative** — fires per-row triggers, enabling incremental delta processing instead of full refresh fallback

---

## Next in This Series

- **[What Happens When You INSERT a Row?](WHAT_HAPPENS_ON_INSERT.md)** — The full 7-phase lifecycle (start here if you haven't already)
- **[What Happens When You UPDATE a Row?](WHAT_HAPPENS_ON_UPDATE.md)** — D+I split, group key changes, net-effect for multiple UPDATEs
- **[What Happens When You DELETE a Row?](WHAT_HAPPENS_ON_DELETE.md)** — Reference counting, group deletion, INSERT+DELETE cancellation