# v0.67.0 — DuckLake Phase 3b: View Registration, Provenance & Ecosystem **Status:** Planned **Scope:** Medium **Theme:** DuckLake Sink — Discoverability and Ecosystem Polish **Plan reference:** [plans/ecosystem/PLAN_DUCKLAKE.md — Phase 3](../plans/ecosystem/PLAN_DUCKLAKE.md#phase-3-ducklake-sink-and-view-registration-v066v068) --- ## Why This Release v0.66.0 delivered the DuckLake sink plumbing — pg_trickle can now write its incrementally computed results into a DuckLake-managed Parquet table. But writing the data is only half the story: users also need to *find* it and *trust* it. v0.67.0 adds two features that turn the sink from a low-level write mechanism into a fully integrated DuckLake citizen: - **View registration (F-6)** automatically inserts a `ducklake_view` entry for every stream table that has a DuckLake sink, so the result set is immediately visible to every DuckDB, Spark, and Trino client that queries the DuckLake catalog — no manual catalog surgery required. - **Snapshot provenance (INT-11)** records which pg_trickle stream table produced each DuckLake snapshot, enabling end-to-end lineage queries: from the raw event in PostgreSQL, through the differential computation, to the Parquet file on S3. Four tutorials and two containerised demos ship alongside the code, making the Phase 3 story accessible to a broad audience ranging from data engineers to analytics platform builders. --- ## Deliverables ### Features (extension code) | ID | Feature | Effort | |----|---------|--------| | F-6 | DuckLake view registration | 1 week | | INT-11 | Snapshot provenance & audit trails | 2 days | #### F-6: DuckLake View Registration When a stream table with `sink => 'ducklake'` is created or altered, pg_trickle automatically inserts (or updates) a matching row in `ducklake_view` so the result is queryable from every DuckLake client as a native catalog object. The view entry includes: - `view_name` — matches the stream table name. - `view_definition` — the original `SELECT` query of the stream table, stored as a reference for human readers. - `source_table_ids` — the OIDs of the source tables, so DuckLake clients can reason about dependencies. When the stream table is dropped, the `ducklake_view` row is removed in the same transaction. This feature means that a DuckDB user who has never heard of pg_trickle will simply see a new view in their DuckLake catalog and query it — the incremental maintenance is completely transparent. #### INT-11: Snapshot Provenance & Audit Trails DuckLake's `ducklake_snapshot` table records who wrote each snapshot, but the `created_by` field is a free-form string. pg_trickle populates this field with a structured identifier: ``` pg_trickle//stream_table// ``` In addition, pg_trickle inserts a row into `pgtrickle.pgt_ducklake_provenance` for each snapshot it writes: | Column | Description | |--------|-------------| | `stream_table_oid` | OID of the producing stream table | | `stream_table_name` | Human-readable name | | `ducklake_snapshot_id` | The DuckLake snapshot ID | | `refresh_id` | pg_trickle's internal refresh sequence number | | `delta_row_count` | Rows in the Parquet delta | | `written_at` | Timestamp | This table is itself a PostgreSQL table, so it can be monitored with a stream table and surfaced in the DuckLake observability dashboard from Demo D. ### Tutorials | Item | Effort | |------|--------| | Tutorial 3: "The Modern Data Stack in One Box" | 2 days | | Tutorial 4: "Streaming PostgreSQL to a Data Lake without Kafka" | 2 days | | INT-10: pg-tide DuckLake pipeline tutorial | 2 days | **Tutorial 3** demonstrates the full Phase 3 stack in a single `docker-compose` environment: OLTP writes in PostgreSQL, pg_trickle stream tables computing rolling aggregations, DuckLake storing the results in Parquet on MinIO, and DuckDB querying the results for ad-hoc analytics. No Kafka, no Airflow, no separate stream processor. **Tutorial 4** shows how to replicate a subset of a PostgreSQL OLTP table into a DuckLake data lake using the sink output mode. The tutorial covers schema mapping, handling deletes (tombstone records), and configuring the `ducklake_sink_mode` to either accumulate deltas or replace the full snapshot. **INT-10 Tutorial** documents the already-working [pg-tide](https://github.com/trickle-labs/pg-tide) DuckLake relay pipeline with a step-by-step walkthrough: install pg-tide and pg_trickle, call `pgtrickle.attach_outbox()`, configure `tide.relay_set_outbox()` with `sink_type => 'ducklake'`, and watch data flow from a stream table into DuckLake in real time. Since this integration works today (as of pg-tide v0.22.0), the tutorial ships as documentation rather than new code. ### Demos | Item | Effort | |------|--------| | Demo C: Multi-engine leaderboard | 4 days | | Demo E: OLTP-to-lake loop | 3 days | **Demo C** runs a real-time game leaderboard that is simultaneously maintained as a pg_trickle stream table in PostgreSQL (for the game backend) and published as a DuckLake table on MinIO (for the analytics team). It demonstrates that the same data, maintained once by pg_trickle, is available to both operational and analytical workloads without duplication. Ships as a self-contained `docker-compose up` demo. **Demo E** shows the full OLTP-to-lake loop: synthetic e-commerce orders land in a PostgreSQL `orders` table, a pg_trickle stream table computes `revenue_by_region` incrementally, the sink publishes each delta to DuckLake, and a DuckDB notebook queries the historical Parquet files for trend analysis. The screen recording shows end-to-end latency from order insert to DuckLake snapshot visibility: typically under two seconds on a laptop. --- ## Release Gate This release does **not** ship until: 1. View registration is tested: creating and dropping a stream table with `sink => 'ducklake'` correctly inserts and removes the `ducklake_view` row in the same transaction. 2. Provenance table is populated correctly across at least 10 consecutive refresh cycles in the E2E test environment. 3. Tutorials 3 and 4 are reviewed by at least one person who is not the author. 4. Both demos run to completion with `docker-compose up` on a clean macOS and Linux machine. 5. `just lint` passes with zero warnings. 6. `just test-e2e` passes. --- ## What Comes Next [v1.0.0](../ROADMAP.md) — Stable release targeting PostgreSQL 19. After Phase 3, pg_trickle has a complete bidirectional DuckLake integration story and is ready to pursue the v1.0 stable release milestones. Phase 4 (INT-5 IVM partnership with DuckDB Labs) is a post-v1.0 initiative and is tracked separately.