# What Happens When You UPDATE a Row? This tutorial traces what happens when an `UPDATE` statement hits a base table that is referenced by a stream table. It covers the trigger capture, the scan-level decomposition into DELETE + INSERT, and how each DVM operator propagates the change — including cases where the group key changes, where JOINs are involved, and where multiple UPDATEs happen within a single refresh window. > **Prerequisite:** Read [WHAT_HAPPENS_ON_INSERT.md](WHAT_HAPPENS_ON_INSERT.md) first — it introduces the full 7-phase lifecycle. This tutorial focuses on how UPDATE differs. ## Setup Same e-commerce example: ```sql CREATE TABLE orders ( id SERIAL PRIMARY KEY, customer TEXT NOT NULL, amount NUMERIC(10,2) NOT NULL ); SELECT pgtrickle.create_stream_table( name => 'customer_totals', query => $$ SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count FROM orders GROUP BY customer $$, schedule => '1m' ); -- Seed some data INSERT INTO orders (customer, amount) VALUES ('alice', 49.99), ('alice', 30.00), ('bob', 75.00); ``` After the first refresh, the stream table contains: ``` customer | total | order_count ---------|-------|------------ alice | 79.99 | 2 bob | 75.00 | 1 ``` --- ## Case 1: Simple Value UPDATE (Same Group Key) ```sql UPDATE orders SET amount = 59.99 WHERE id = 1; ``` Alice's first order changes from 49.99 to 59.99. The customer (group key) stays the same. ### Phase 1: Trigger Capture The AFTER UPDATE trigger fires and writes **one row** to the change buffer with both OLD and NEW values: ``` pgtrickle_changes.changes_16384 ┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐ │ change_id │ lsn │ action │ new_cust │ new_amt │ old_cust │ old_amt │ pk_hash │ ├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤ │ 4 │ 0/1A3F3000 │ U │ alice │ 59.99 │ alice │ 49.99 │ -837291 │ └───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘ ``` Key difference from INSERT: the trigger writes **both** `new_*` and `old_*` columns. The pk_hash is computed from `NEW.id`. ### Phase 2–4: Scheduler, Frontier, Change Detection Identical to the INSERT flow. The scheduler detects one change row in the LSN window. ### Phase 5: Scan Differentiation — The U → D+I Split This is where UPDATE handling diverges fundamentally. The scan delta operator **decomposes the UPDATE into two events**: ``` __pgt_row_id | __pgt_action | customer | amount -------------|--------------|----------|------- -837291 | D | alice | 49.99 ← old values (DELETE) -837291 | I | alice | 59.99 ← new values (INSERT) ``` **Why split into D+I?** This is a core IVM principle. Downstream operators (aggregates, joins, filters) don't have special "update" logic — they only understand insertions and deletions. By decomposing the UPDATE: - The **DELETE** event subtracts the old values from running aggregates - The **INSERT** event adds the new values This algebraic approach handles arbitrary operator trees without operator-specific update logic. ### Phase 5 (continued): Aggregate Differentiation The aggregate operator processes both events against the `alice` group: ```sql -- DELETE event: subtract old values alice: total += CASE WHEN action='D' THEN -49.99 END → -49.99 alice: count += CASE WHEN action='D' THEN -1 END → -1 -- INSERT event: add new values alice: total += CASE WHEN action='I' THEN +59.99 END → +59.99 alice: count += CASE WHEN action='I' THEN +1 END → +1 ``` Net effect on alice's group: ``` total delta: -49.99 + 59.99 = +10.00 count delta: -1 + 1 = 0 ``` The aggregate emits this as an INSERT (because the group still exists and its value changed): ``` customer | total | order_count | __pgt_row_id | __pgt_action ---------|--------|-------------|--------------|------------- alice | +10.00 | 0 | 7283194 | I ``` ### Phase 6: MERGE The MERGE updates the existing row: ```sql -- MERGE WHEN MATCHED AND action = 'I' THEN UPDATE: -- alice's total: 79.99 + 10.00 = 89.99 (via reference counting) -- alice's count: 2 + 0 = 2 ``` Wait — that's not right. The MERGE doesn't add deltas; it **replaces** the row. The aggregate delta query actually computes the **new absolute value** by combining the stored state with the delta: ```sql COALESCE(existing.total, 0) + delta.total → 79.99 + 10.00 = 89.99 COALESCE(existing.__pgt_count, 0) + delta.__pgt_count → 2 + 0 = 2 ``` Result: ```sql SELECT * FROM customer_totals; customer | total | order_count ----------|-------|------------ alice | 89.99 | 2 ← was 79.99 bob | 75.00 | 1 ``` --- ## Case 2: Group Key Change (Customer Reassignment) ```sql UPDATE orders SET customer = 'bob' WHERE id = 2; ``` Alice's second order (amount=30.00) is reassigned to Bob. The **group key itself changes**. ### Trigger Capture ``` change_id | lsn | action | new_cust | new_amt | old_cust | old_amt | pk_hash 5 | 0/1A3F3100 | U | bob | 30.00 | alice | 30.00 | 4521038 ``` The old and new customer values differ. ### Scan Delta: D+I Split ``` __pgt_row_id | __pgt_action | customer | amount -------------|--------------|----------|------- 4521038 | D | alice | 30.00 ← removes from alice's group 4521038 | I | bob | 30.00 ← adds to bob's group ``` ### Aggregate Delta The aggregate groups by `customer`, so the DELETE and INSERT land in **different groups**: ``` Group "alice": total delta: -30.00 count delta: -1 Group "bob": total delta: +30.00 count delta: +1 ``` ### After MERGE ```sql SELECT * FROM customer_totals; customer | total | order_count ----------|--------|------------ alice | 59.99 | 1 ← lost one order (-30.00) bob | 105.00 | 2 ← gained one order (+30.00) ``` This is why the D+I decomposition is essential. Without it, you'd need special "move between groups" logic. With it, the standard aggregate differentiation handles group key changes naturally. --- ## Case 3: UPDATE That Deletes a Group ```sql -- Alice only has one order left. Reassign it to bob. UPDATE orders SET customer = 'bob' WHERE id = 1; ``` ### Aggregate Delta ``` Group "alice": total delta: -59.99 count delta: -1 new __pgt_count: 1 - 1 = 0 → group vanishes! Group "bob": total delta: +59.99 count delta: +1 ``` When `__pgt_count` reaches 0, the aggregate emits a **DELETE** for alice's group: ``` customer | total | __pgt_row_id | __pgt_action ---------|-------|--------------|------------- alice | — | 7283194 | D ← group removed bob | ... | 9182734 | I ← group updated ``` The MERGE deletes alice's row entirely: ```sql SELECT * FROM customer_totals; customer | total | order_count ----------|--------|------------ bob | 165.00 | 3 ``` --- ## Case 4: Multiple UPDATEs on the Same Row (Within One Refresh Window) What if a row is updated multiple times before the next refresh? ```sql UPDATE orders SET amount = 10.00 WHERE id = 3; -- bob: 75 → 10 UPDATE orders SET amount = 20.00 WHERE id = 3; -- bob: 10 → 20 UPDATE orders SET amount = 30.00 WHERE id = 3; -- bob: 20 → 30 ``` The change buffer now has 3 rows for pk_hash of order #3: ``` change_id | action | old_amt | new_amt 6 | U | 75.00 | 10.00 7 | U | 10.00 | 20.00 8 | U | 20.00 | 30.00 ``` ### Net-Effect Computation The scan delta uses a **split fast-path** design. Since order #3 has multiple changes (cnt > 1), it takes the multi-change path with window functions: ```sql FIRST_VALUE(action) OVER (PARTITION BY pk_hash ORDER BY change_id) → 'U' LAST_VALUE(action) OVER (...) → 'U' ``` Both first and last actions are 'U', so: - **DELETE**: emits using old values from the **earliest** change (change_id=6): `old_amt = 75.00` - **INSERT**: emits using new values from the **latest** change (change_id=8): `new_amt = 30.00` Net delta: ``` __pgt_row_id | __pgt_action | amount -------------|--------------|------- pk_hash_3 | D | 75.00 ← original value before all changes pk_hash_3 | I | 30.00 ← final value after all changes ``` The aggregate sees `-75.00 + 30.00 = -45.00`. This is correct regardless of the intermediate values. The intermediate rows (10.00, 20.00) are never seen. --- ## Case 5: INSERT + UPDATE in Same Window ```sql INSERT INTO orders (customer, amount) VALUES ('charlie', 100.00); UPDATE orders SET amount = 200.00 WHERE customer = 'charlie'; ``` Both happen before the next refresh. The buffer has: ``` change_id | action | old_amt | new_amt 9 | I | NULL | 100.00 10 | U | 100.00 | 200.00 ``` Net-effect analysis: - `first_action = 'I'` (row didn't exist before this window) - `last_action = 'U'` (row exists after) Result: - **No DELETE** emitted (first_action = 'I' means the row was born in this window) - **INSERT** with final values: `(charlie, 200.00)` The aggregate sees a pure insertion of `(charlie, 200.00)` — the intermediate value of 100.00 never appears. --- ## Case 6: UPDATE + DELETE in Same Window ```sql UPDATE orders SET amount = 999.99 WHERE id = 3; DELETE FROM orders WHERE id = 3; ``` Net-effect: - `first_action = 'U'` (row existed before) - `last_action = 'D'` (row no longer exists) Result: - **DELETE** with original old values from the first change - **No INSERT** (last_action = 'D') The aggregate correctly sees only a removal. --- ## Case 7: UPDATE with JOINs Consider a stream table that joins two tables: ```sql CREATE TABLE customers ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, tier TEXT NOT NULL DEFAULT 'standard' ); CREATE TABLE orders ( id SERIAL PRIMARY KEY, customer_id INT REFERENCES customers(id), amount NUMERIC(10,2) ); SELECT pgtrickle.create_stream_table( name => 'order_details', query => $$ SELECT c.name, c.tier, o.amount FROM orders o JOIN customers c ON o.customer_id = c.id $$, schedule => '1m' ); ``` Now update a customer's tier: ```sql UPDATE customers SET tier = 'premium' WHERE name = 'alice'; ``` ### How the JOIN Delta Works The join differentiation follows the formula: $$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$ Since only the `customers` table changed: - $\Delta L$ = changes to orders (empty) - $\Delta R$ = changes to customers (alice's tier: standard → premium) So: - **Part 1**: $\Delta\text{orders} \bowtie \text{customers}$ = empty (no order changes) - **Part 2**: $\text{orders} \bowtie \Delta\text{customers}$ = all of alice's orders joined with her tier change - **Part 3**: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (no order changes) Part 2 produces the delta: for each of alice's orders, DELETE the old row (with tier='standard') and INSERT a new row (with tier='premium'). The stream table is updated to reflect the new tier across all of alice's order rows. --- ## Performance Summary | Scenario | Buffer rows | Delta rows emitted | Work | |----------|------------|-------------------|------| | Simple value change | 1 | 2 (D+I) | O(1) per group | | Group key change | 1 | 2 (D+I, different groups) | O(1) per affected group | | Group deletion | 1 | 1 (D) + 1 (I) or 1 (D) | O(1) | | N updates same row | N | 2 (D first-old + I last-new) | O(N) scan, O(1) aggregate | | INSERT+UPDATE same window | 2 | 1 (I only) | O(1) | | UPDATE+DELETE same window | 2 | 1 (D only) | O(1) | In all cases, the work is proportional to the number of **changed rows**, not the total table size. A single UPDATE on a billion-row table produces the same delta cost as on a 10-row table. --- ## What About IMMEDIATE Mode? Everything above describes **DIFFERENTIAL** mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports **IMMEDIATE** mode, where the stream table is updated synchronously within the same transaction as your UPDATE. ### How IMMEDIATE Mode Differs for UPDATE | Phase | DIFFERENTIAL | IMMEDIATE | |-------|-------------|----------| | **Trigger type** | Row-level AFTER trigger | Statement-level AFTER trigger with `REFERENCING OLD TABLE, NEW TABLE` | | **What's captured** | One buffer row with old_* and new_* | Two transition tables: `__pgt_oldtable` and `__pgt_newtable` | | **When delta runs** | Next scheduler tick | Immediately, in the same transaction | | **D+I decomposition** | In the scan delta CTE | Same algebra, but reading from transition temp tables | | **Concurrency** | No locking between writers | Advisory lock per stream table | When you run `UPDATE orders SET amount = 59.99 WHERE id = 1`: 1. A **BEFORE UPDATE** trigger acquires an advisory lock on the stream table 2. The **AFTER UPDATE** trigger captures both `OLD TABLE AS __pgt_oldtable` and `NEW TABLE AS __pgt_newtable` into temp tables 3. The DVM engine generates the same D+I decomposition, reading old values from the old-table and new values from the new-table 4. The delta is applied to the stream table immediately 5. Any query within the same transaction sees the updated stream table ```sql BEGIN; UPDATE orders SET amount = 59.99 WHERE id = 1; -- customer_totals already reflects the new amount here! SELECT * FROM customer_totals WHERE customer = 'alice'; COMMIT; ``` The same D+I split, aggregate differentiation, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled). --- ## Next in This Series - **[What Happens When You INSERT a Row?](WHAT_HAPPENS_ON_INSERT.md)** — The full 7-phase lifecycle (start here if you haven't already) - **[What Happens When You DELETE a Row?](WHAT_HAPPENS_ON_DELETE.md)** — Reference counting, group deletion, INSERT+DELETE cancellation - **[What Happens When You TRUNCATE a Table?](WHAT_HAPPENS_ON_TRUNCATE.md)** — Why TRUNCATE bypasses triggers and how to recover