[← Back to Blog Index](README.md) # Cycles in Your Dependency Graph? That's Fine. ## Fixed-point iteration for monotone queries in pg_trickle --- Dependency cycles are normally a fatal error in data pipelines. Airflow rejects them. dbt rejects them. Most IVM systems reject them. The logic is sound: if A depends on B and B depends on A, which one do you refresh first? pg_trickle takes a different position. Some cycles are legitimate. A graph-reachability table that feeds a scoring table that feeds a threshold filter that feeds back into the reachability table — that's a cycle, but it has a well-defined fixed point. The system can iterate until nothing changes. The key constraint: the queries in the cycle must be **monotone**. pg_trickle detects this at creation time and rejects non-monotone cycles. Monotone cycles converge. Non-monotone cycles can oscillate forever. --- ## What Monotone Means A query is monotone if adding rows to its input can only add rows to its output — never remove them. In practical terms: **Monotone operations:** - `SELECT ... WHERE ...` (filter) — adding a row that passes the filter adds it to the output - `JOIN` (inner, left, right, full) — adding a row that matches adds join results - `UNION ALL` — adding to either side adds to the output - `GROUP BY ... HAVING ...` with monotone aggregates (COUNT, SUM of non-negatives) **Non-monotone operations:** - `EXCEPT` — adding to the right side *removes* from the output - `NOT EXISTS` — adding a matching row removes the anti-joined row - Aggregates that can decrease (`MIN`, `MAX` with deletions) - `DISTINCT` (adding a duplicate doesn't add to the output) pg_trickle checks monotonicity when you create a stream table that would form a cycle. If any table in the cycle uses a non-monotone operator, the creation fails: ``` ERROR: stream table 'filtered_scores' would create a non-monotone cycle DETAIL: NOT EXISTS in query is not monotone HINT: Remove the circular dependency or use a non-cyclic design ``` --- ## Enabling Circular Dependencies Cycles are disabled by default: ```sql -- This fails with cycles disabled SHOW pg_trickle.allow_circular; -- off SELECT pgtrickle.create_stream_table( name => 'scores', query => $$ SELECT ... FROM reachable $$, schedule => '5s' ); -- ERROR: would create circular dependency (reachable → scores → reachable) ``` To allow monotone cycles: ```sql SET pg_trickle.allow_circular = on; ``` Or in `postgresql.conf`: ``` pg_trickle.allow_circular = on ``` --- ## How Fixed-Point Iteration Works When the scheduler encounters a cycle (a strongly connected component, or SCC, in the dependency graph), it doesn't try to topologically sort it — that's impossible for a cycle. Instead, it iterates: 1. **Refresh all tables in the SCC once**, in an arbitrary order. 2. **Check for net changes.** If any table in the SCC produced new rows, go to step 1. 3. **If no table produced new rows**, the SCC has converged. Move on. This is fixed-point iteration. The monotonicity constraint guarantees it terminates: each iteration can only add rows (never remove them), and the total result is bounded (finite source tables, finite joins, finite query results). Eventually, no new rows are produced. --- ## A Concrete Example Consider a fraud detection pipeline where: 1. `suspicious_accounts` flags accounts based on transaction patterns. 2. `risky_transactions` flags transactions involving suspicious accounts. 3. `suspicious_accounts` also considers accounts that have many risky transactions. This is circular: suspicious accounts → risky transactions → suspicious accounts. ```sql SET pg_trickle.allow_circular = on; -- Table 1: Accounts flagged by direct pattern matching SELECT pgtrickle.create_stream_table( name => 'suspicious_accounts', query => $$ SELECT account_id, 'pattern' AS reason FROM transactions GROUP BY account_id HAVING COUNT(*) FILTER (WHERE amount > 10000) > 5 UNION ALL -- Also flag accounts with many risky transactions SELECT DISTINCT account_id, 'association' AS reason FROM risky_transactions GROUP BY account_id HAVING COUNT(*) > 10 $$, schedule => '10s' ); -- Table 2: Transactions involving suspicious accounts SELECT pgtrickle.create_stream_table( name => 'risky_transactions', query => $$ SELECT t.* FROM transactions t JOIN suspicious_accounts sa ON t.account_id = sa.account_id WHERE t.amount > 1000 $$, schedule => '10s' ); ``` On the first iteration: - `suspicious_accounts` finds accounts with >5 high-value transactions (pattern match). - `risky_transactions` flags transactions by those accounts. On the second iteration: - `suspicious_accounts` now also sees the risky-transaction counts. Some new accounts cross the >10 threshold. - `risky_transactions` picks up transactions from the newly flagged accounts. On the third iteration (typically): - No new accounts are flagged. No new transactions are flagged. Convergence. --- ## The Iteration Limit To prevent runaway iteration (in case of a bug or a degenerate data pattern), pg_trickle enforces a limit: ```sql SHOW pg_trickle.max_fixpoint_iterations; -- 10 ``` If a cycle doesn't converge within 10 iterations, pg_trickle stops, logs a warning, and marks the SCC as "not converged." The tables are still queryable — they just contain the result after 10 iterations, which may not be the complete fixed point. ``` WARNING: SCC {suspicious_accounts, risky_transactions} did not converge after 10 iterations (12 new rows in last iteration) ``` You can increase the limit if your domain requires deeper iteration: ```sql SET pg_trickle.max_fixpoint_iterations = 50; ``` --- ## Monitoring SCCs pg_trickle exposes SCC status through the monitoring API: ```sql SELECT * FROM pgtrickle.scc_status(); ``` ``` scc_id | tables | last_iterations | converged | last_refresh --------+--------------------------------------------+-----------------+-----------+------------------- 1 | {suspicious_accounts,risky_transactions} | 3 | t | 2026-04-27 10:15:03 ``` Key fields: - `last_iterations`: How many rounds it took to converge. - `converged`: Whether the last refresh reached a fixed point. - `last_refresh`: When the SCC was last fully resolved. If `converged` is consistently `false`, your cycle may not be monotone in practice (even if the query structure passes the static check). Check whether data patterns cause oscillation. --- ## Why Most Systems Reject Cycles The standard argument against cycles is that they're a design smell — if your data model has circular dependencies, you should restructure it. That's usually right. But some domains genuinely have circular relationships: - **Graph algorithms:** Reachability, PageRank, label propagation — all defined as fixed points. - **Constraint propagation:** Scheduling constraints that reference each other. - **Multi-phase classification:** Where the output of one classifier feeds into another, and vice versa. - **Supply chain:** Demand forecasts that depend on inventory, which depends on demand forecasts. For these cases, "restructure your schema" is unhelpful. The circularity is in the domain, not in a modeling mistake. --- ## Cycles and IMMEDIATE Mode IMMEDIATE mode (synchronous in-transaction refresh) and cycles don't mix. If A triggers a refresh of B, which triggers a refresh of A, you get an infinite loop inside a single transaction. pg_trickle rejects this at creation time: ``` ERROR: IMMEDIATE mode is not allowed for stream tables in a cycle HINT: Use DIFFERENTIAL or AUTO mode with a schedule ``` Cyclic stream tables must use scheduled (asynchronous) refresh. The fixed-point iteration runs in the background scheduler, not inside a user transaction. --- ## Performance Considerations Each iteration of a cycle is a full refresh cycle for all tables in the SCC. The total cost is: ``` cost = iterations × Σ(cost per table in SCC) ``` For most practical cycles, convergence happens in 2–4 iterations. But if your cycle has 10 tables that each take 200ms to refresh, one convergence pass takes 2–8 seconds. **Optimization tips:** - Keep cycles small. Factor non-cyclic portions of the query out of the SCC. - Use DIFFERENTIAL mode. Each iteration after the first typically processes only the new rows from the previous iteration. - Set a tight `max_fixpoint_iterations` to fail fast if convergence is slow. - Monitor `scc_status()` to track iteration counts. If they're consistently high, the cycle may need restructuring. --- ## Summary Circular dependencies aren't always a bug. pg_trickle allows them for monotone queries — queries where adding input rows can only add output rows. The system iterates until convergence (no new rows) or until the iteration limit. Enable with `pg_trickle.allow_circular = on`. Monitor with `scc_status()`. Keep cycles small and monotone. And if convergence takes too many iterations, that's a signal to reconsider the design — not a reason to ban cycles entirely.