# REPORT_TIMESCALEDB.md — TimescaleDB Synergy & Lessons for pg_trickle Date: 2026-03-01 Status: RESEARCH --- ## 1. Executive Summary TimescaleDB and pg_trickle are **complementary** PostgreSQL extensions solving different problems — partitioned time-series storage vs incremental view maintenance. They don't conflict at the storage layer and can coexist. However, there is a concrete **synergy opportunity**: pg_trickle can provide rich IVM over hypertables where TimescaleDB's Continuous Aggregates are limited (no complex JOINs, no subqueries, no window functions, no cross-table aggregation). Several TimescaleDB design patterns are also worth studying for adoption in pg_trickle. This document catalogues: (1) where the projects complement each other, (2) specific synergy scenarios, (3) engineering challenges for combined use, (4) design ideas worth borrowing, and (5) a phased action plan. --- ## 2. Comparison | | TimescaleDB | pg_trickle | |---|---|---| | Core problem | Efficient time-series storage & querying | Keeping materialized views fresh incrementally | | Key mechanism | Hypertables — auto-partitioning by time, compression, retention policies | Differential dataflow (DBSP) — delta query derivation from operator trees | | IVM approach | Continuous Aggregates — restricted to `time_bucket` GROUP BY on a single hypertable | Stream Tables — broad SQL (JOINs, CTEs, window functions, subqueries, set operations) over any table | | Change tracking | Invalidation log per hypertable chunk/time-bucket | Row-level triggers → WAL logical replication | | Workload model | Append-heavy time-series (INSERT-dominant) | General OLTP (INSERT/UPDATE/DELETE) | | Refresh model | Materialization policies on invalidated buckets | Scheduled or on-demand delta refresh over change buffers | | Pipeline support | Cagg-on-cagg (hierarchical rollups) | Stream-table-on-stream-table DAG with demand-driven scheduling | | Storage | Compressed columnar chunks for old data | Standard PostgreSQL heap tables | | License | Apache-2.0 (community) / Timescale License (enterprise features) | Apache-2.0 | --- ## 3. Synergy Scenarios ### 3.1. Rich SQL over Hypertables Continuous Aggregates are limited to `GROUP BY time_bucket(...)` on a single hypertable with scalar aggregates. Users cannot: - JOIN a hypertable with a dimension table (e.g., `sensor_readings JOIN devices`) - Use window functions (`ROW_NUMBER`, `LAG`, `LEAD`) - Use subqueries or CTEs in the aggregate definition - Combine data from multiple hypertables - Use `DISTINCT ON`, `UNION`, `INTERSECT`, `EXCEPT` pg_trickle stream tables support all of these. A user could define: ```sql SELECT pgtrickle.create_stream_table( 'enriched_hourly_readings', $$ SELECT time_bucket('1 hour', r.ts) AS bucket, d.facility, d.region, avg(r.temperature) AS avg_temp, max(r.temperature) - min(r.temperature) AS temp_spread, count(*) AS reading_count FROM sensor_readings r JOIN devices d ON d.device_id = r.device_id WHERE d.is_active GROUP BY 1, 2, 3 $$, schedule := '5 minutes' ); ``` This is impossible with a Continuous Aggregate alone. ### 3.2. Cross-Hypertable Aggregation Organizations often have multiple hypertables (e.g., `cpu_metrics`, `memory_metrics`, `disk_metrics`) that need to be combined for dashboards. Continuous Aggregates can't span tables. A pg_trickle stream table can JOIN and aggregate across them. ### 3.3. Hierarchical Pipelines Mixing Both A Continuous Aggregate could provide a pre-materialized time-bucketed rollup that feeds into a pg_trickle stream table for enrichment: ``` sensor_readings (hypertable) └─▶ hourly_readings (Continuous Aggregate — fast time-bucket rollup) └─▶ enriched_hourly_readings (pg_trickle stream table — JOIN with devices) └─▶ regional_summary (pg_trickle stream table — further aggregation) ``` ### 3.4. Real-Time Alerting on Time-Series Data pg_trickle's `NOTIFY`-based alerting combined with stream tables over hypertables could enable threshold-based alerts that fire when an aggregated metric crosses a boundary — without polling. --- ## 4. Engineering Challenges ### 4.1. Trigger-Based CDC on Hypertables — HIGH Hypertables are the parent table; actual data lives in **chunk** child tables (`_timescaledb_internal._hyper___chunk`). TimescaleDB redirects INSERT operations to the appropriate chunk based on the time dimension. **Impact:** Row-level `AFTER INSERT` triggers defined on the parent hypertable **do** fire for inserts — TimescaleDB propagates triggers to chunks. However, this behavior needs validation under: - Chunk creation (new chunks appear as data arrives for new time ranges) - Chunk compression (rows become read-only columnar data) - Chunk decompression (for updates to compressed ranges) - Chunk migration (move/copy between tablespaces) **Investigation needed:** - [ ] Verify trigger propagation works for newly created chunks - [ ] Test that `pg_current_wal_lsn()` inside a trigger on a chunk returns a sensible value - [ ] Benchmark trigger overhead on high-throughput hypertable inserts (time- series workloads often do 100K+ inserts/sec) ### 4.2. WAL-Based CDC on Hypertables — MEDIUM Logical replication decoding sees DML on chunk tables, not the parent hypertable. pg_trickle's WAL decoder would need to: 1. Map chunk OIDs back to the parent hypertable OID 2. Handle DDL from chunk management (CREATE TABLE for new chunks, ALTER for compression transitions) 3. Ignore or correctly handle the internal metadata changes that TimescaleDB writes alongside user data The mapping can be done via `_timescaledb_catalog.chunk` → `_timescaledb_catalog.hypertable` lookups, but this adds a dependency on TimescaleDB's internal schema. ### 4.3. Compression Interactions — MEDIUM When TimescaleDB compresses a chunk: 1. Data is READ from the uncompressed chunk 2. Compressed data is WRITTEN to a compressed chunk (internal format) 3. The uncompressed chunk is TRUNCATED Steps 1–3 appear in WAL as normal DML. The DELETE + INSERT pattern would be misinterpreted by pg_trickle's CDC as real data changes, triggering a spurious full delta computation. **Mitigation options:** - Detect compression operations via `_timescaledb_catalog` metadata and skip the corresponding WAL events - Use TimescaleDB's compression hooks (if exposed) to suppress CDC during compression - Only track uncompressed (recent) chunks — this naturally aligns with the time-series pattern where only recent data changes ### 4.4. `time_bucket()` Function in Delta Queries — LOW `time_bucket()` is a TimescaleDB function, not a core PostgreSQL function. When pg_trickle parses a defining query containing `time_bucket()`, it needs to: 1. Recognize it as a deterministic, immutable function (safe for IVM) 2. Not attempt to "differentiate" through it — treat it as a scalar transform pg_trickle already handles arbitrary immutable functions as pass-through in delta queries, so this should work without changes. Needs validation. ### 4.5. Continuous Aggregate Internals as Source — LOW If a user defines a stream table over a Continuous Aggregate's materialized view, pg_trickle needs to handle the fact that the underlying object is a materialized hypertable (with its own chunk structure), not a plain table. Triggers on the cagg's materialized hypertable would capture the deltas that TimescaleDB's materializer produces. --- ## 5. Design Ideas Worth Borrowing ### 5.1. Time-Bucketed Invalidation Tracking **What TimescaleDB does:** Instead of tracking every changed row, the invalidation log records which time-bucket ranges were modified: ``` invalidation_log: (hypertable_id, lowest_modified_value, greatest_modified_value) ``` On refresh, only the affected time buckets are recomputed — not the entire table, and not individual row deltas. **Applicability to pg_trickle:** For source tables with a time/sequence column used in GROUP BY, pg_trickle could maintain a coarser invalidation log alongside (or instead of) per-row change buffers. This would dramatically reduce change buffer size for time-series-style workloads. **Complexity:** HIGH — requires detecting temporal grouping patterns in the defining query and implementing range-based invalidation in the refresh engine. **Priority:** Phase 3+ (optimization, not correctness) ### 5.2. Real-Time Aggregation Mode **What TimescaleDB does:** Continuous Aggregates support a "real-time" mode that at query time combines: - The materialized (potentially stale) aggregate data - A live query over unmaterialized recent data (since last refresh) The result is always-fresh reads without requiring a refresh. **Applicability to pg_trickle:** A stream table could expose a view that UNIONs: 1. The materialized storage table 2. A delta query over the unflushed change buffer ```sql -- Conceptual: auto-generated view CREATE VIEW st_realtime AS SELECT * FROM st_materialized UNION ALL SELECT * FROM (/* delta query over change buffer */); ``` **Complexity:** MEDIUM-HIGH — the delta query derivation already exists (it's the refresh path). The challenge is making it performant at query time without a full refresh cycle, and handling deduplication correctly for non-append-only workloads (UPDATEs/DELETEs require MERGE-like semantics, not UNION ALL). **Priority:** Phase 2 — high user value; aligns with the "freshness without latency" goal. **See also:** [PLAN_TRANSACTIONAL_IVM.md](../sql/PLAN_TRANSACTIONAL_IVM.md) for transactionally updated views (immediate IVM), which overlaps with this idea. ### 5.3. Append-Only Fast Path **What TimescaleDB does:** Time-series workloads are insert-only. TimescaleDB optimizes for this by not tracking deletes/updates in the invalidation log. **Applicability to pg_trickle:** When pg_trickle detects that a source table is append-only (configurable hint or detected via trigger — no UPDATE/DELETE events observed), the delta query can be simplified: - **JOINs:** Skip the bilinear expansion. For `A JOIN B` where A is append-only and B is static, the delta is just `ΔA JOIN B` (one term instead of three). - **Aggregates:** Skip maintaining DELETE counters in auxiliary state. - **Change buffer:** Skip the `op` column (always 'I'); use a simpler schema. **Complexity:** LOW-MEDIUM — this is a specialization of existing paths. **Priority:** Phase 2 — significant performance win for time-series use cases. ### 5.4. Coarse-Grained Change Tracking for Partitioned Tables **What TimescaleDB does:** Invalidation is tracked per-chunk (partition), not per-row. Chunks map to time ranges, so "which chunks changed?" is a O(1) check per chunk. **Applicability to pg_trickle:** For PostgreSQL native partitioned tables (not just hypertables), pg_trickle could maintain a partition-level change flag: 1. On first change to a partition since last refresh, set a dirty flag 2. On refresh, only scan change buffers for dirty partitions 3. Clear dirty flags after refresh This doesn't change the delta computation — it's a filter that avoids scanning empty change buffers. **Complexity:** LOW — partition OIDs are available from `pg_inherits`. **Priority:** Phase 2 — especially valuable when combined with TimescaleDB. ### 5.5. Declarative Policy Framework **What TimescaleDB does:** Policies are first-class objects: ```sql SELECT add_continuous_aggregate_policy('hourly_readings', start_offset => INTERVAL '3 hours', end_offset => INTERVAL '1 hour', schedule_interval => INTERVAL '1 hour' ); ``` Policies have built-in monitoring (`timescaledb_information.job_stats`), retry logic, exponential backoff, and can be paused/resumed. **Applicability to pg_trickle:** pg_trickle already has schedule and status management via `alter_stream_table`. The lesson is more about **UX**: - Expose refresh policies as queryable objects (already done via `pgtrickle.pgt_stream_tables`) - Add retry/backoff configuration (partially done via `consecutive_errors`) - Consider a `pgtrickle.add_refresh_policy()` wrapper that mirrors the TimescaleDB ergonomics for users coming from that ecosystem **Priority:** Phase 3 — ergonomic polish. ### 5.6. Chunk-Aware Retention **What TimescaleDB does:** `add_retention_policy()` automatically drops chunks older than a threshold. **Applicability to pg_trickle:** Stream table change buffers grow over time. A built-in retention policy for change buffers (drop change buffer entries older than N refreshes or N hours) would prevent unbounded growth. This is partially handled by the post-refresh cleanup, but a dedicated policy would be more robust for error scenarios where refreshes fail repeatedly. **Priority:** Phase 3. --- ## 6. Action Plan ### Phase 1 — Validation (1–2 weeks) **Goal:** Confirm that pg_trickle works at all with hypertable source tables. | # | Task | Effort | |---|------|--------| | 1.1 | Set up a test environment with both extensions on PG 18 | 2h | | 1.2 | Create a hypertable and define a pg_trickle stream table over it (simple GROUP BY) | 2h | | 1.3 | Verify trigger-based CDC fires correctly on hypertable inserts | 2h | | 1.4 | Test with chunk creation (insert data spanning multiple time ranges) | 2h | | 1.5 | Test with `time_bucket()` in defining query | 1h | | 1.6 | Measure trigger overhead on high-throughput inserts (10K–100K rows/sec) | 4h | | 1.7 | Document findings and any blockers | 2h | **Deliverable:** Test report confirming basic compatibility or listing blockers. ### Phase 2 — Core Optimizations (2–4 weeks) **Goal:** Implement the highest-value patterns learned from TimescaleDB. | # | Task | Effort | Depends on | |---|------|--------|------------| | 2.1 | Append-only source hint (`source_mode := 'append_only'`) | 3d | — | | 2.2 | Simplified delta queries for append-only JOINs | 3d | 2.1 | | 2.3 | Partition-level dirty tracking for partitioned/hyper tables | 2d | Phase 1 | | 2.4 | Investigate real-time aggregation mode (design doc) | 3d | — | | 2.5 | E2E test suite: pg_trickle + TimescaleDB scenarios | 3d | Phase 1 | ### Phase 3 — Advanced Integration (4–8 weeks) **Goal:** Handle edge cases and polish the combined experience. | # | Task | Effort | Depends on | |---|------|--------|------------| | 3.1 | WAL decoder: map chunk OIDs → parent hypertable OID | 3d | Phase 1 | | 3.2 | WAL decoder: filter compression-related DML | 2d | 3.1 | | 3.3 | Stream table over Continuous Aggregate source | 3d | Phase 1 | | 3.4 | Time-bucketed invalidation tracking (design + prototype) | 2w | 2.4 | | 3.5 | Documentation: "Using pg_trickle with TimescaleDB" tutorial | 2d | Phase 2 | | 3.6 | Refresh policy ergonomics (`add_refresh_policy` wrapper) | 2d | — | | 3.7 | Change buffer retention policy | 2d | — | --- ## 7. Non-Goals - **Replacing Continuous Aggregates.** For simple `time_bucket` GROUP BY on a single hypertable, Continuous Aggregates are faster and simpler. pg_trickle targets the cases caggs can't handle. - **Depending on TimescaleDB.** All TimescaleDB integration must be optional. pg_trickle detects TimescaleDB at runtime (`SELECT 1 FROM pg_extension WHERE extname = 'timescaledb'`) and enables optimizations conditionally. - **Modifying TimescaleDB internals.** All integration works through public APIs and catalog tables. --- ## 8. Open Questions 1. **Trigger propagation on new chunks** — Does TimescaleDB propagate existing parent-table triggers to chunks created *after* the trigger was defined? (Likely yes, but needs verification.) 2. **Compression hook** — Is there a way to detect that a chunk is being compressed (hook, event trigger, catalog flag) so CDC can ignore the resulting DML? 3. **Cagg materialized hypertable OID stability** — When a Continuous Aggregate is refreshed, does the materialized hypertable's OID remain stable? (Almost certainly yes, but worth confirming for CDC registration.) 4. **`time_bucket` immutability** — Is `time_bucket()` marked `IMMUTABLE` in the catalog? pg_trickle requires source functions to be immutable for correct IVM. (Likely yes for fixed-interval overloads; the timezone-aware overloads may be `STABLE`.) 5. **Licensing** — Some TimescaleDB features (e.g., compression, continuous aggregates) are under the Timescale License, not Apache-2.0. Does depending on their catalog tables or behavior create any licensing concerns for pg_trickle? (Likely no — pg_trickle would only query public catalog views, not link to TSL code.) --- ## 9. References - [TimescaleDB documentation — Continuous Aggregates](https://docs.timescale.com/use-timescale/continuous-aggregates/) - [TimescaleDB documentation — Hypertables](https://docs.timescale.com/use-timescale/hypertables/) - [TimescaleDB documentation — Compression](https://docs.timescale.com/use-timescale/compression/) - [TimescaleDB source — Invalidation log](https://github.com/timescale/timescaledb/blob/main/src/ts_catalog/continuous_agg.h) - [DBSP paper (Budiu et al., 2022)](https://arxiv.org/abs/2203.16684) - pg_trickle [ARCHITECTURE.md](../../docs/ARCHITECTURE.md) - pg_trickle [PLAN_TRANSACTIONAL_IVM.md](../sql/PLAN_TRANSACTIONAL_IVM.md) - pg_trickle [PLAN_CITUS.md](../infra/PLAN_CITUS.md) — similar extension-compat plan