# v0.65.0 — DuckLake Phase 2: Change-Feed Adapter **Status:** Released **Scope:** Large **Theme:** DuckLake-Optimised Polling **Plan reference:** [plans/ecosystem/PLAN_DUCKLAKE.md — Phase 2](../plans/ecosystem/PLAN_DUCKLAKE.md#phase-2-ducklake-optimised-polling-v064v065) --- ## Why This Release v0.64.0 (Phase 1) established pg_trickle as a DuckLake ecosystem citizen through documentation, tutorials, and demos — all without touching extension code. Phase 2 is where that relationship becomes a first-class engineering story. Today, using a DuckLake-backed foreign table as a stream table source works via the generic polling path: pg_trickle issues a full `EXCEPT ALL` scan on every refresh cycle, which does O(N) work regardless of how many rows actually changed. For a DuckLake table with millions of rows and only tens of new rows per batch, this is wasteful by orders of magnitude. DuckLake exposes an exact remedy: the `table_changes()` function returns only the rows that changed between two snapshot IDs. This release wires that API directly into pg_trickle's CDC pipeline so that refreshes do O(Δ) work — proportional to the change, not the table size. Everything else follows from that: a snapshot-based frontier to track progress, a fast trigger path for inlined data, DuckLake `rowid` threading for O(1) delta application, and compaction-safety guards so the system degrades gracefully when old snapshots are expired. --- ## Deliverables ### Features (extension code) | ID | Feature | Effort | |----|---------|--------| | F-1 | DuckLake change-feed adapter (`CdcMode::DuckLakeChangeFeed`) | 1 week | | F-3 | Snapshot-based frontier model (snapshot IDs alongside WAL LSNs) | 3 days | | F-5 | Inlined-data trigger adapter (understands `begin_snapshot`/`end_snapshot`) | 3 days | | F-7 | Row-ID plumbing (pass DuckLake `rowid` through to DVM engine) | 3 days | | F-8 | Snapshot-window compaction safety (`pg_trickle.ducklake_compaction_policy`) | 2 days | #### F-1: DuckLake Change-Feed Adapter Replace the generic `EXCEPT ALL` polling for DuckLake foreign tables with a snapshot-aware adapter that calls `table_changes(from_snapshot, to_snapshot)` and produces O(Δ) work per refresh. The adapter tracks `last_consumed_snapshot_id` rather than LSN. Internally this introduces `CdcMode::DuckLakeChangeFeed` as a new CDC mode variant in `cdc.rs`, detected automatically when the source table's foreign data wrapper is identified as DuckLake. #### F-3: Snapshot-Based Frontier Extend the frontier model to carry DuckLake snapshot IDs alongside WAL LSNs and clock-based markers. A frontier row for a mixed-source stream table looks like: ```json { "ducklake:lake.events": { "snapshot_id": 42 }, "ducklake:lake.users": { "snapshot_id": 38 }, "wal:postgres": { "lsn": "0/16A4F08" } } ``` This lets a single stream table join a PostgreSQL OLTP table with a DuckLake analytics table and still have a coherent, single-transaction consistency guarantee. #### F-5: Inlined-Data Trigger Adapter DuckLake can store small tables directly in PostgreSQL as `ducklake_inlined_data_table__`. These tables are native PostgreSQL tables, so pg_trickle's standard row-level AFTER trigger CDC applies — but the schema convention (virtual columns `row_id`, `begin_snapshot`, `end_snapshot`, `is_deleted`) differs from a plain OLTP table. This feature adds a specialised trigger function that translates those columns into standard INSERT/DELETE change-buffer rows, plus a DDL watcher that recreates the trigger each time DuckLake rotates the inlined table to a new schema version. #### F-7: Row-ID Plumbing The DVM engine's row-identity layer today always derives a stable identifier from the row's content or primary key. For DuckLake sources, DuckLake already provides a stable `rowid` virtual column — we should use it directly. This feature extends the row-identity interface to accept a caller-supplied stable identifier, eliminating the hash computation for DuckLake sources and enabling exact O(1) delta application. #### F-8: Snapshot-Window Compaction Safety DuckLake compaction can expire old snapshots, after which `table_changes()` no longer has access to earlier history. If pg_trickle's `last_consumed_snapshot_id` falls before the compaction horizon, the next refresh will fail. This feature detects that condition and applies the configured policy: - `fallback` (default): automatically fall back to a full refresh and log a warning so the user knows to extend `ducklake_snapshot_retention`. - `error`: raise a clear, actionable error message rather than silently re-scanning. Configure via `ALTER STREAM TABLE … SET (ducklake_compaction_policy = 'error')`. ### Tests | Item | Effort | |------|--------| | Integration test suite: DuckDB + ducklake extension | 3 days | | Unit tests for F-1 adapter, F-3 frontier, F-8 policy | 2 days | The integration tests use Testcontainers to spin up a PostgreSQL + DuckDB environment, install the DuckLake extension, write batches of synthetic events, and assert that stream tables refresh with exactly the expected delta size (not a full scan). A property-based test confirms that the snapshot frontier never moves backwards. ### Tutorials | Item | Effort | |------|--------| | Tutorial 2: "IVM for DuckLake before v2.0" | 2 days | | Tutorial 6: "Sub-millisecond inlined-data CDC" | 2 days | **Tutorial 2** walks through creating a stream table on a DuckLake foreign table and shows the performance difference between the generic polling path (before v0.65.0) and the change-feed adapter (after v0.65.0): same result, 100× less work on large tables. **Tutorial 6** targets the inlined-data fast path: a DuckLake table small enough to be kept in PostgreSQL, monitored with a trigger adapter, refreshed in sub-millisecond time. Useful for high-frequency event streams that do not yet need to move to Parquet. ### Demo | Item | Effort | |------|--------| | Demo B: Time-travel debugging | 3 days | Demo B shows a stream table over a DuckLake change-feed being queried at a specific past snapshot ID. The demo highlights the snapshot-based frontier: roll back `last_consumed_snapshot_id` and the stream table rewinds deterministically. Ships as a self-contained `docker-compose up` demo with a README and a short screen recording. --- ## Release Gate This release does **not** ship until: 1. The change-feed adapter integration tests pass with DuckDB 1.x + DuckLake 1.x. 2. The O(Δ) property is verified by a benchmark: refresh latency must scale with `|Δ|`, not `|table|`, across at least three table sizes (10K, 1M, 100M rows). 3. `just lint` passes with zero warnings. 4. `just test-integration` passes. --- ## What Comes Next [v0.66.0](v0.66.0.md) — DuckLake Phase 3a: Parquet delta export and the DuckLake sink output mode. pg_trickle starts writing results *back into* DuckLake.