# Changelog All notable changes to pgclone are documented in this file. ## [3.6.0] ### Added - **GDPR/Compliance Masking Report**: `pgclone_masking_report(schema)` generates an audit report listing all sensitive columns, their masking status, and recommendations - Detects sensitive columns using shared `sensitivity_rules` (~40 patterns across 10 categories) - Checks if a masked view (`table_masked`) exists for each table - `mask_status`: `MASKED (view)` or `UNMASKED` - `recommendation`: either "OK - masked via view" or "Apply mask strategy: X" - Returns SET OF (schema_name, table_name, column_name, sensitivity, mask_status, recommendation) - Useful for GDPR, HIPAA, SOX compliance audits — "prove no PII exists unmasked" ### Changed - Refactored sensitivity rules into shared `SensitivityRule` struct and `pgclone_match_sensitivity()` helper — used by both `pgclone_discover_sensitive` and `pgclone_masking_report` - Version bumped to 3.6.0 ## [3.5.0] ### Added - **Clone Verification**: `pgclone_verify()` compares row counts between source and target databases, table by table, returning a side-by-side comparison - Two overloads: `pgclone_verify(conninfo)` for all schemas, `pgclone_verify(conninfo, schema)` for a specific schema - Returns SET OF (schema_name, table_name, source_rows, target_rows, match) - Match indicators: `✓` (equal), `✗` (different), `✗ (missing)` (table not on target) - Uses `pg_class.reltuples` for fast approximate counts without full table scans - Works with regular and partitioned tables - 5 new pgTAP tests (84 total): verify function runs, row counts match after clone, correct column count in result ### Changed - Version bumped to 3.5.0 ## [3.4.0] ### Added - **Clone Roles with Permissions and Passwords**: `pgclone_clone_roles()` clones database roles from source to local, including encrypted passwords, role attributes, and all privilege grants - `pgclone_clone_roles(conninfo)` — clone all non-system roles - `pgclone_clone_roles(conninfo, 'role1,role2')` — clone specific roles (comma-separated) - Clones role attributes: LOGIN, SUPERUSER, CREATEDB, CREATEROLE, REPLICATION, INHERIT, CONNECTION LIMIT, VALID UNTIL - Copies encrypted passwords from `pg_authid` (requires superuser on both source and target) - If target role already exists: updates password, attributes, and applies permissions additively - Clones role memberships (GRANT role TO role) - Clones schema-level privileges (USAGE, CREATE) via `aclexplode()` - Clones table-level privileges (SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER) - Clones sequence privileges (USAGE, SELECT, UPDATE) - Clones function/procedure EXECUTE privileges - `test_reader`, `test_writer`, `test_admin` test fixture roles with graduated permissions - 6 new pgTAP tests (79 total): role creation, LOGIN attribute, CREATEDB attribute verification ### Changed - Version bumped to 3.4.0 ## [3.3.0] ### Added - **Dynamic Data Masking via Views**: Role-based masking policies that preserve original data while presenting masked views to unprivileged users - `pgclone_create_masking_policy(schema, table, mask_json, privileged_role)`: creates a masked view (`table_masked`), revokes base table SELECT from PUBLIC, grants view access to PUBLIC, grants direct table access to the privileged role - `pgclone_drop_masking_policy(schema, table)`: drops the masked view and restores base table access to PUBLIC - Uses the same `pgclone_build_mask_expr()` engine as clone-time and static masking — all 8 strategies work - Queries local `pg_attribute` for column names, applies mask expressions to matching columns - 7 new pgTAP tests (73 total): policy creation, view existence, masked data verification, privileged role bypass, policy drop, view cleanup ### Changed - Version bumped to 3.3.0 ## [3.2.0] ### Added - **Static Data Masking on Local Tables**: `pgclone_mask_in_place(schema, table, mask_json)` applies masking to already-cloned tables via UPDATE statements — no source connection needed - Uses the same mask JSON format as clone-time masking: `{"email": "email", "name": "name", "ssn": "null"}` - Reuses `pgclone_build_mask_expr()` for consistent masking between clone-time and post-clone paths - Wraps user-provided JSON into clone options format for unified parsing - Returns summary: `OK: masked N rows in schema.table (M columns)` - 7 new pgTAP tests (66 total): clone-then-mask workflow, verify original data removed, names/SSNs masked, row count preserved ### Changed - Version bumped to 3.2.0 ## [3.1.0] ### Added - **Auto-Discovery of Sensitive Data**: `pgclone_discover_sensitive(conninfo, schema_name)` scans the source catalog for columns matching ~40 sensitive data patterns and returns suggested mask rules as JSON - Pattern categories: email, name, phone, SSN/national ID, financial (salary/income), secrets/credentials (password/token/api_key), address, date of birth, credit card, IP address - Case-insensitive matching with LIKE-style wildcards against column names - Output is a JSON object grouped by table, ready to paste into the `"mask"` option: `{"employees": {"email": "email", "salary": "random_int", "ssn": "null"}}` - Scans regular tables and partitioned tables (relkind `r` and `p`) - 6 new pgTAP tests (59 total): discovery function runs, detects email/full_name/phone/salary/ssn columns ### Changed - Version bumped to 3.1.0 ## [3.0.0] ### Added - **Data Masking / Anonymization**: Column-level data masking applied during cloning via the `"mask"` JSON option - 8 masking strategies: `email`, `name`, `phone`, `partial`, `hash`, `null`, `random_int`, `constant` - `email`: preserves domain, masks local part (`alice@example.com` → `a***@example.com`) - `name`: replaces with `XXXX` - `phone`: keeps last 4 digits (`+1-555-123-4567` → `***-4567`) - `partial`: configurable prefix/suffix retention (`Johnson` → `Jo***on` with prefix=2, suffix=2) - `hash`: deterministic MD5 hash — preserves referential integrity across tables - `null`: replaces with NULL - `random_int`: random integer in configurable `[min, max]` range - `constant`: fixed replacement value - All strategies are NULL-safe (NULL inputs produce NULL outputs, except `constant` and `random_int`) - Masking is applied server-side as SQL expressions inside `COPY (SELECT ...) TO STDOUT` — no row-by-row processing overhead - Fully composable with existing `columns`, `where`, `indexes`, `constraints`, `triggers` options - `MaskRule` struct and `pgclone_build_mask_expr()` internal functions for extensible mask strategy architecture - 15 new pgTAP tests (53 total): email masking, name masking, null masking, hash masking, constant masking, combined masks with WHERE filter - `test_schema.employees` test fixture table with realistic sensitive data (names, emails, phones, salaries, SSNs) ### Changed - `CloneOptions` struct extended with `masks[]` array and `num_masks` counter - `pgclone_parse_options()` handles `"mask"` JSON key with both simple string values (`"email"`) and object values (`{"type":"partial", "prefix":2, "suffix":3}`) - `pgclone_copy_data()` queries source catalog for column names when masks are active (without explicit column list) to apply per-column mask expressions - Version bumped to 3.0.0 ## [2.2.1] ### Added - **WHERE clause SQL injection protection**: Two-layer defense against malicious input in the `"where"` JSON option - Layer 1: Keyword validation rejects DDL/DML keywords (`DROP`, `INSERT`, `UPDATE`, `DELETE`, `CREATE`, `ALTER`, `TRUNCATE`, etc.) and semicolons before the query is sent - Layer 2: Source connection runs inside `BEGIN TRANSACTION READ ONLY` when a WHERE clause is present — PostgreSQL itself blocks any write operations even if validation is bypassed - Word-boundary-aware matching avoids false positives on column names like `created_at`, `update_count`, `drop_rate` - 4 new pgTAP tests (37 total): semicolon rejection, DROP keyword rejection, INSERT keyword rejection, false-positive safety with `created_at` column ## [2.2.0] ### Changed - **Worker Pool Architecture**: Parallel cloning (`"parallel": N`) now uses a fixed-size worker pool instead of spawning one background worker per table - Exactly N workers are launched, each pulling tasks from a shared queue - Dynamic load balancing: faster workers automatically handle more tables - Resource usage reduced from O(tables) to O(N) for bgworkers and DB connections - No longer risk exhausting `max_worker_processes` on large schemas ### Added - `PgclonePoolQueue` struct in shared memory for task queue management - `pgclone_pool_worker_main()` background worker entry point - Pool worker test in `test/test_async.sh` (TEST 8) - `PGCLONE_MAX_POOL_TASKS` limit (512 tables per pool operation) - Guard against concurrent pool operations ### Removed - Per-table background worker launch in parallel mode (replaced by pool) - Per-table job slot allocation in parallel mode (pool workers share fewer slots) ## [2.1.4] ### Changed - Local loopback connections now use Unix domain sockets (from `unix_socket_directories` GUC) instead of TCP `127.0.0.1` - `pg_hba.conf` `trust` entry for `127.0.0.1` is no longer required for async operations — default `local all all peer` is sufficient - Falls back to TCP `127.0.0.1` automatically if Unix sockets are unavailable ### Fixed - Security: removed unnecessary `trust` authentication requirement for background worker connections ## [2.1.3] ### Fixed - Async bgworker: COPY pipeline error handling — failures now logged with `PQerrorMessage`, source COPY result consumed on error path to prevent connection leak - Async bgworker: `WaitForBackgroundWorkerStartup` added to all async functions — jobs no longer stuck in 'pending' state - `pgclone_schema_async` parallel mode: fixed hardcoded `jobs[0]` write that corrupted slot 0 in shared memory - `pgclone_schema_async` parallel mode: parent job now correctly transitions to COMPLETED after child workers finish ### Added - `pgclone_clear_jobs()` function to free completed/cancelled job slots from shared memory - Async test suite (`test/test_async.sh`) covering `pgclone_table_async`, `pgclone_schema_async`, `pgclone_progress`, `pgclone_jobs_view`, `pgclone_clear_jobs` - `CONTRIBUTING.md` — development setup, code guidelines, PR process - `SECURITY.md` — vulnerability reporting, security considerations - Documentation restructured: `docs/USAGE.md`, `docs/ASYNC.md`, `docs/TESTING.md`, `docs/ARCHITECTURE.md`, `CHANGELOG.md` ## [2.1.2] ### Added - Elapsed time column in `pgclone_jobs_view` - `elapsed_time` field in progress detail output ## [2.1.1] ### Changed - Visual progress bar in `pgclone_jobs_view` replaces verbose NOTICE messages - Per-table/per-row NOTICE messages moved to DEBUG1 level ## [2.1.0] ### Added - `pgclone_jobs_view` — query async job progress as a standard PostgreSQL view - `pgclone_progress_detail()` — table-returning function for detailed progress ## [2.0.1] ### Added - `pgclone_database_create()` — create a new database and clone into it - Automatic pgclone extension installation in the target database - Idempotent behavior: clones into existing database if it already exists ## [2.0.0] ### Added - True multi-worker parallel cloning with `"parallel": N` option - Each table gets its own background worker - Parent worker monitors child workers via shared memory ### Changed - Shared memory layout expanded for parallel job tracking ## [1.2.0] ### Added - Materialized view cloning during schema clone (with `"matviews": false` opt-out) - Exclusion constraint support (cloned alongside PK, UNIQUE, CHECK, FK) - Materialized view indexes and data are preserved ## [1.1.0] ### Added - Selective column cloning with `"columns": [...]` JSON option - Data filtering with `"where": "..."` JSON option - Automatic filtering of constraints/indexes referencing excluded columns ## [1.0.0] ### Added - Async clone operations via background workers (`pgclone_table_async`, `pgclone_schema_async`) - Job progress tracking (`pgclone_progress`, `pgclone_jobs`) - Job cancellation (`pgclone_cancel`) - Job resume from checkpoint (`pgclone_resume`) - Job cleanup (`pgclone_clear_jobs`) - Conflict resolution strategies: error, skip, replace, rename ## [0.3.0] ### Added - Background worker infrastructure - Shared memory allocation for job state - `_PG_init` with shmem hooks ## [0.2.0] ### Added - Index cloning (including expression and partial indexes) - Constraint cloning (PRIMARY KEY, UNIQUE, CHECK, FOREIGN KEY) - Trigger cloning with trigger functions - JSON options format for controlling indexes/constraints/triggers - Boolean parameter variants (`pgclone_table_ex`, `pgclone_schema_ex`) ## [0.1.0] ### Added - Initial release - `pgclone_table()` — clone a single table with or without data - `pgclone_schema()` — clone an entire schema - `pgclone_functions()` — clone functions only - `pgclone_database()` — clone all user schemas - COPY protocol for fast data transfer - `pgclone_version()` — version string - Support for PostgreSQL 14–18 - pgTAP test suite (33 tests) - Docker Compose multi-version test infrastructure - GitHub Actions CI pipeline