# Partitioned Tables as Sources This tutorial shows how pg_trickle works with PostgreSQL's declarative table partitioning. It covers RANGE, LIST, and HASH partitioned source tables, explains what happens when you add or remove partitions, and documents known caveats. ## Background PostgreSQL lets you split large tables into smaller "partitions" — for example one partition per month for an `orders` table. This is a common technique for managing very large datasets. pg_trickle handles partitioned source tables transparently: - **CDC triggers fire on all partitions.** PostgreSQL 13+ automatically clones row-level triggers from the parent to every child partition. All DML (INSERT, UPDATE, DELETE) on any partition is captured in a single change buffer keyed by the parent table's OID. - **ATTACH PARTITION is detected automatically.** When you add a new partition with pre-existing data, pg_trickle's DDL event trigger detects the change and marks affected stream tables for reinitialization. No manual intervention required. - **WAL-based CDC works correctly.** When using WAL mode, publications are created with `publish_via_partition_root = true` so all partition changes appear under the parent table's identity. ## Example: Monthly Sales Partitions (RANGE) ```sql -- Create a RANGE-partitioned source table CREATE TABLE sales ( id SERIAL, sale_date DATE NOT NULL, region TEXT NOT NULL, amount NUMERIC NOT NULL, PRIMARY KEY (id, sale_date) ) PARTITION BY RANGE (sale_date); -- Create partitions for each half of the year CREATE TABLE sales_h1_2025 PARTITION OF sales FOR VALUES FROM ('2025-01-01') TO ('2025-07-01'); CREATE TABLE sales_h2_2025 PARTITION OF sales FOR VALUES FROM ('2025-07-01') TO ('2026-01-01'); -- Insert data across partitions INSERT INTO sales (sale_date, region, amount) VALUES ('2025-02-15', 'US', 100.00), ('2025-05-20', 'EU', 250.00), ('2025-08-10', 'US', 175.00), ('2025-11-30', 'EU', 300.00); -- Create a stream table over the partitioned source SELECT pgtrickle.create_stream_table( name => 'regional_sales', query => $$ SELECT region, SUM(amount) AS total, COUNT(*) AS cnt FROM sales GROUP BY region $$, schedule => '1 minute', refresh_mode => 'DIFFERENTIAL' ); -- Refresh to populate SELECT pgtrickle.refresh_stream_table('regional_sales'); -- Verify — aggregates span all partitions: SELECT * FROM regional_sales ORDER BY region; -- region | total | cnt -- --------+--------+----- -- EU | 550.00 | 2 -- US | 275.00 | 2 ``` ## Adding New Partitions When you add a new partition, any new rows inserted through the parent are automatically captured by CDC triggers. The trigger on the parent is cloned to the new partition by PostgreSQL. ```sql -- Add a new partition for 2026 CREATE TABLE sales_h1_2026 PARTITION OF sales FOR VALUES FROM ('2026-01-01') TO ('2026-07-01'); -- Inserts into the new partition are captured normally INSERT INTO sales (sale_date, region, amount) VALUES ('2026-03-15', 'US', 400.00); -- Next refresh picks up the new row SELECT pgtrickle.refresh_stream_table('regional_sales'); SELECT * FROM regional_sales ORDER BY region; -- region | total | cnt -- --------+--------+----- -- EU | 550.00 | 2 -- US | 675.00 | 3 ``` ## ATTACH PARTITION with Pre-Existing Data The most important edge case: attaching a table that already contains rows. These rows were never seen by CDC triggers, so the stream table would be stale. pg_trickle detects this automatically. ```sql -- Create a standalone table with existing data CREATE TABLE sales_h2_2026 ( id SERIAL, sale_date DATE NOT NULL, region TEXT NOT NULL, amount NUMERIC NOT NULL, PRIMARY KEY (id, sale_date) ); INSERT INTO sales_h2_2026 (sale_date, region, amount) VALUES ('2026-08-01', 'EU', 500.00), ('2026-09-15', 'US', 200.00); -- Attach it to the partitioned table ALTER TABLE sales ATTACH PARTITION sales_h2_2026 FOR VALUES FROM ('2026-07-01') TO ('2027-01-01'); -- pg_trickle detects the partition change and marks the stream table -- for reinitialize. Check: SELECT pgt_name, needs_reinit FROM pgtrickle.pgt_stream_tables WHERE pgt_name = 'regional_sales'; -- pgt_name | needs_reinit -- -----------------+-------------- -- regional_sales | t -- The next refresh reinitializes — re-reading all data from scratch: SELECT pgtrickle.refresh_stream_table('regional_sales'); SELECT * FROM regional_sales ORDER BY region; -- region | total | cnt -- --------+---------+----- -- EU | 1050.00 | 3 -- US | 875.00 | 4 ``` ## DETACH PARTITION When you detach a partition, the detached table's data is no longer visible through the parent. pg_trickle detects this too and marks stream tables for reinitialize. ```sql -- Archive the old partition ALTER TABLE sales DETACH PARTITION sales_h1_2025; -- Stream table is marked for reinit: SELECT pgt_name, needs_reinit FROM pgtrickle.pgt_stream_tables WHERE pgt_name = 'regional_sales'; -- pgt_name | needs_reinit -- -----------------+-------------- -- regional_sales | t -- After refresh, the detached partition's rows are gone: SELECT pgtrickle.refresh_stream_table('regional_sales'); SELECT * FROM regional_sales ORDER BY region; -- (only rows from remaining partitions) ``` ## LIST Partitioning LIST partitioning splits rows by discrete values. It works identically: ```sql CREATE TABLE events ( id SERIAL, region TEXT NOT NULL, payload TEXT, PRIMARY KEY (id, region) ) PARTITION BY LIST (region); CREATE TABLE events_us PARTITION OF events FOR VALUES IN ('US'); CREATE TABLE events_eu PARTITION OF events FOR VALUES IN ('EU'); CREATE TABLE events_ap PARTITION OF events FOR VALUES IN ('AP'); SELECT pgtrickle.create_stream_table( name => 'event_counts', query => 'SELECT region, count(*) AS cnt FROM events GROUP BY region', schedule => '1 minute' ); ``` ## HASH Partitioning HASH partitioning distributes rows across a fixed number of partitions. Useful for spreading write load evenly: ```sql CREATE TABLE metrics ( id SERIAL PRIMARY KEY, sensor_id INT NOT NULL, value DOUBLE PRECISION ) PARTITION BY HASH (id); CREATE TABLE metrics_0 PARTITION OF metrics FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE metrics_1 PARTITION OF metrics FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE metrics_2 PARTITION OF metrics FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE metrics_3 PARTITION OF metrics FOR VALUES WITH (MODULUS 4, REMAINDER 3); SELECT pgtrickle.create_stream_table( name => 'sensor_avg', query => $$ SELECT sensor_id, AVG(value) AS avg_val, COUNT(*) AS cnt FROM metrics GROUP BY sensor_id $$, schedule => '1 minute' ); ``` ## Foreign Tables Tables from other databases (via `postgres_fdw`) can be used as sources, but with restrictions: - **No trigger-based CDC** — foreign tables don't support row-level triggers. - **No WAL-based CDC** — foreign tables don't generate local WAL. - **FULL refresh works** — `SELECT *` executes a remote query each time. - **Polling-based CDC works** — when `pg_trickle.foreign_table_polling` is enabled, pg_trickle creates a local snapshot table and detects changes via `EXCEPT ALL` comparison. When you use a foreign table as a source, pg_trickle emits an info message explaining the limitations: ```sql CREATE EXTENSION postgres_fdw; CREATE SERVER remote_db FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'remote-host', dbname 'analytics'); CREATE USER MAPPING FOR CURRENT_USER SERVER remote_db OPTIONS (user 'reader'); CREATE FOREIGN TABLE remote_orders ( id INT, amount NUMERIC ) SERVER remote_db OPTIONS (table_name 'orders'); -- Only FULL refresh is available: SELECT pgtrickle.create_stream_table( name => 'remote_totals', query => 'SELECT SUM(amount) AS total FROM remote_orders', schedule => '5 minutes', refresh_mode => 'FULL' ); -- INFO: pg_trickle: source table remote_orders is a foreign table. -- Foreign tables cannot use trigger-based or WAL-based CDC — -- only FULL refresh mode or polling-based change detection is supported. ``` ## Known Caveats | Caveat | Description | |--------|-------------| | **PostgreSQL 13+ required** | Parent-table triggers only propagate to child partitions on PG 13+. pg_trickle targets PostgreSQL 18, so this is always satisfied. | | **Partition key in PRIMARY KEY** | PostgreSQL requires the partition key to be part of any unique constraint. This means your `PRIMARY KEY` must include the partition column. | | **ATTACH with data = reinitialize** | Attaching a partition with pre-existing rows triggers a full reinitialize on the next refresh. For very large tables, this may be slow. Consider gating the source with `pgtrickle.gate_source()` during bulk partition operations. | | **Sub-partitioning** | Multi-level partitioning (partitions of partitions) works in principle because triggers propagate through the entire hierarchy, but it is not extensively tested. | | **pg_partman compatibility** | `pg_partman` dynamically creates and drops partitions. Since pg_trickle detects ATTACH/DETACH via DDL event triggers, it should work, but this combination is not yet tested. | | **Partitioned storage tables** | Using a partitioned table as the stream table's *storage* is not supported. This is tracked for a future release. | | **DETACH PARTITION CONCURRENTLY** | `DETACH PARTITION ... CONCURRENTLY` is a two-phase operation. The DDL event trigger fires after the first phase; the partition is not fully detached until the second phase commits. The stream table may briefly reflect the old partition count. |