> **Plain-language companion:** [v0.66.0.md](v0.66.0.md) ## v0.66.0 — DuckLake Phase 3a: Parquet Sink Infrastructure **Status: Released.** --- ### Implementation Status | ID | Feature | Status | |----|---------|--------| | F-4 | Parquet delta export via `arrow-rs` | ✅ Done | | F-2 | DuckLake sink output mode (`sink => 'ducklake'`) | ✅ Done | | — | S3 / object-store upload integration (`file://`, `s3://`) | ✅ Done | | — | DuckLake catalog transaction writer | ✅ Done | | F-9 | Encryption key pass-through | ✅ Done | | — | Unit tests: catalog rollback on upload failure | ✅ Done | | — | E2E tests: DuckLake sink write path | ✅ Done | --- ### Correctness - **F-4** serialises stream-table delta rows to Parquet using `arrow-array` + `parquet` crates. Snappy is the default compression codec; `pg_trickle.ducklake_sink_compression` GUC controls codec selection. - **F-2** adds `sink`, `ducklake_sink_path`, and `ducklake_sink_table_id` parameters to `create_stream_table()` and `alter_stream_table()`. These are stored in three new columns on `pgtrickle.pgt_stream_tables`. - **S3 upload** wraps the `object_store` crate with a `new_current_thread` Tokio runtime, isolating async I/O from PostgreSQL's signal handlers. - **Catalog writer** (`register_ducklake_data_file`) inserts into `ducklake_data_file`, updates `ducklake_table_stats`, and inserts a `ducklake_snapshot` row within a single `Spi::connect_mut` block; the SPI transaction rolls back atomically on any error. - **Rollback safety**: upload is attempted _before_ any catalog write. If upload fails, `run_ducklake_sink_inner` returns `Err(DucklakeUploadError)` via `?`, so `register_ducklake_data_file` is never called. The rollback invariant is verified by `test_upload_failure_prevents_catalog_writes`. - **F-9** generates a `//` encryption key ID and stores it in `ducklake_data_file.encryption_key_id` for every file written to an encrypted lake. ### Stability - Sink failures are logged as `WARNING` via `pgrx::warning!` and never block the next scheduled refresh. - A failed upload leaves the catalog untouched. A failed catalog write leaves an orphaned Parquet file on object storage (DuckLake VACUUM collects it). ### Performance - `write_parquet_bytes` is synchronous and runs in the PostgreSQL backend process. - S3 upload uses a `new_current_thread` Tokio runtime created per sink call — no shared thread pool, no state carried between calls. ### Scalability - One Parquet file per refresh cycle. No batching or multi-part logic for files below the default 5 MB threshold (large-file multipart upload is planned for a future release). ### Ease of Use - Single-parameter activation: `ALTER STREAM TABLE ... SET (sink = 'ducklake', ducklake_sink_path = 's3://...')`. - GUC defaults are sensible (Snappy, `us-east-1`); only the path and optional table ID are required for basic use. ### Test Coverage **Unit tests (src/ducklake_sink.rs):** - `test_resolve_compression_snappy_is_default` - `test_resolve_compression_unknown_defaults_to_snappy` - `test_resolve_compression_none` - `test_resolve_compression_zstd` - `test_write_parquet_bytes_empty_schema` - `test_write_parquet_bytes_single_int_column` - `test_write_parquet_bytes_mixed_types` - `test_udt_to_arrow_type_hint_integers` - `test_udt_to_arrow_type_hint_floats` - `test_udt_to_arrow_type_hint_bool` - `test_udt_to_arrow_type_hint_timestamp` - `test_udt_to_arrow_type_hint_text` - `test_generate_encryption_key_id_format` - `test_upload_local_creates_file` - `test_upload_local_nested_directory` - `test_upload_parquet_unsupported_scheme_returns_error` - `test_upload_parquet_gcs_not_supported_error` - `test_upload_failure_prevents_catalog_writes` ← rollback invariant proof **Integration tests (tests/ducklake_integration_tests.rs):** - `test_ducklake_sink_mode_accepts_valid_values` — SQL API validation - `test_ducklake_sink_mode_rejects_invalid_values` — CHECK constraint - `test_ducklake_sink_path_can_be_set` — column round-trip - `test_ducklake_sink_table_id_can_be_set` — FK-free BIGINT column - `test_ducklake_sink_columns_default_to_null` — default state **E2E tests (tests/e2e_ducklake_tests.rs):** - `test_ducklake_sink_parquet_file_written_after_refresh` — write-path E2E - `test_ducklake_sink_catalog_not_modified_when_upload_fails` — rollback E2E --- ### Exit Criteria (all satisfied) - [x] Parquet delta export (`write_parquet_bytes`) produces valid Parquet output for all supported Arrow types. - [x] `sink => 'ducklake'` parameter accepted by `create_stream_table()` and `alter_stream_table()`. - [x] S3 client builds and uploads successfully; `file://` path used as a proxy for MinIO in E2E tests. - [x] DuckLake catalog writer inserts `ducklake_data_file`, `ducklake_table_stats`, and `ducklake_snapshot` rows within one SPI block. - [x] Rollback test proves failed upload leaves zero stale catalog rows (`test_upload_failure_prevents_catalog_writes`, `test_ducklake_sink_catalog_not_modified_when_upload_fails`). - [x] Encryption key ID generated and stored in `ducklake_data_file.encryption_key_id`. - [x] `just lint` passes with zero warnings. - [x] `just test-unit` passes. - [x] E2E tests for sink write path pass.