# SubPlan Detection and Cost Tracking ## Overview The `pg_track_optimizer` extension now detects and tracks expensive SubPlans (correlated subqueries) by calculating a "worst SubPlan factor" metric. This helps identify queries where correlated subqueries execute many times, causing performance issues. ## How It Works ### SubPlans in PostgreSQL SubPlans are correlated subqueries that appear in expression contexts (WHERE clauses, SELECT lists, etc.). Unlike regular plan nodes, SubPlans: - Are not direct children in the plan tree - Execute once per outer row (stored in `pstate->subPlan` list) - Have their effective cost multiplied by the number of executions ### Detection Algorithm When walking the plan tree in `prediction_walker()` (plan_error.c:35-77): 1. After recursively walking child nodes, check if the current node has SubPlans: `if (pstate->subPlan != NIL)` 2. For each SubPlan attached to this node: - Get instrumentation data: `instr = sps->planstate->instrument` - Get execution loops: `nloops = instr->nloops` - Get planned cost: `cost = sps->planstate->plan->total_cost` - Calculate effective cost: `nloops × cost` 3. Track the worst (maximum) SubPlan factor across all SubPlans in the query ### Cost Factor Calculation ```c f_worst_splan = max(nloops × total_cost) ``` This metric represents: - **nloops**: How many times the SubPlan executed (once per outer row) - **total_cost**: PostgreSQL's estimated cost for the SubPlan - **Product**: The actual cost burden of the SubPlan on query performance ## Database Schema The new field appears in three places: ### 1. C Structure (pg_track_optimizer.c:110) ```c RStats f_worst_splan; /* Worst SubPlan cost factor (nloops × cost) */ ``` ### 2. SQL Function (pg_track_optimizer--0.1.sql:173) ```sql OUT f_worst_splan rstats ``` ### 3. View Columns (pg_track_optimizer--0.1.sql:234-237) ```sql t.f_worst_splan -> 'min' AS sp_min, t.f_worst_splan -> 'max' AS sp_max, t.f_worst_splan -> 'count' AS sp_cnt, t.f_worst_splan -> 'mean' AS sp_avg, t.f_worst_splan -> 'stddev' AS sp_dev ``` ## Usage Examples ### Example 1: Find Queries with Expensive SubPlans ```sql SELECT queryid, LEFT(query, 50) AS query_preview, ROUND((f_worst_splan -> 'mean')::numeric, 2) AS avg_sp_factor, ROUND((f_worst_splan -> 'max')::numeric, 2) AS max_sp_factor, nexecs FROM pg_track_optimizer() WHERE (f_worst_splan -> 'mean')::numeric > 1000 ORDER BY (f_worst_splan -> 'mean')::numeric DESC LIMIT 10; ``` This finds queries where SubPlans have an average cost factor over 1000, indicating they execute many times with significant cost. ### Example 2: Compare SubPlan Cost to Total Execution Time ```sql SELECT queryid, ROUND((f_worst_splan -> 'mean')::numeric, 2) AS sp_factor, ROUND((exec_time -> 'mean')::numeric, 2) AS avg_time_ms, nexecs FROM pg_track_optimizer() WHERE (f_worst_splan -> 'mean')::numeric > 0 ORDER BY (f_worst_splan -> 'mean')::numeric DESC; ``` Queries with high `sp_factor` relative to `exec_time` suggest the SubPlan dominates execution time. ### Example 3: Detect SubPlan Loops from Test From `sql/subplan.sql` regression test: ```sql SELECT ROUND((avg_error -> 'mean')::numeric, 2) AS error, ROUND((f_join_filter -> 'mean')::numeric, 2) AS jf, ROUND((f_scan_filter -> 'mean')::numeric, 2) AS lf, ROUND((f_worst_splan -> 'mean')::numeric, 2) AS sp_factor, evaluated_nodes, plan_nodes, nexecs FROM pg_track_optimizer() WHERE query LIKE '%FROM outer_table%'; ``` Expected output shows `sp_factor` of ~1900 for a SubPlan that executed 380 loops. ## Interpreting Results ### Low Values (< 100) SubPlans are not a significant performance concern. The correlated subquery either: - Executes few times (low nloops) - Has low individual cost ### Medium Values (100 - 1000) SubPlans contribute measurably to query cost. Consider: - Can the subquery be rewritten as a JOIN? - Is the SubPlan indexed properly? - Could LATERAL join help? ### High Values (> 1000) SubPlans are likely a major performance bottleneck. Optimization strategies: 1. **Rewrite as JOIN**: ```sql -- Before (SubPlan) SELECT * FROM orders o WHERE total > (SELECT AVG(total) FROM orders WHERE customer_id = o.customer_id); -- After (JOIN) SELECT o.* FROM orders o JOIN (SELECT customer_id, AVG(total) as avg_total FROM orders GROUP BY customer_id) avg ON o.customer_id = avg.customer_id WHERE o.total > avg.avg_total; ``` 2. **Use LATERAL join** (PostgreSQL 9.3+): ```sql SELECT * FROM outer_table o JOIN LATERAL ( SELECT threshold FROM reference_table r WHERE r.category = o.category ) sub ON o.val > sub.threshold; ``` 3. **Add indexes** to SubPlan's inner table scan 4. **Materialize subquery** if results are reused: ```sql WITH thresholds AS MATERIALIZED ( SELECT category, threshold FROM reference_table ) SELECT * FROM outer_table o JOIN thresholds t ON o.category = t.category WHERE o.val > t.threshold; ``` ## Implementation Details ### Key Code Locations - **Detection**: `plan_error.c:35-77` - Iterates through `pstate->subPlan` list - **Context**: `plan_error.h:59` - Field in `PlanEstimatorContext` - **Storage**: `pg_track_optimizer.c:110` - Field in `DSMOptimizerTrackerEntry` - **Initialization**: `plan_error.c:312` - Set to 0.0 before tree walk - **Accumulation**: `pg_track_optimizer.c:447-448` - Add value to running stats ### Special Cases 1. **InitPlans**: SubPlans with `setParam != NIL` execute once before the main query. These are not currently tracked separately but would show `nloops = 1`. 2. **Parallel Workers**: Current implementation asserts `sps->worker_instrument == NULL`, meaning SubPlans don't parallelize (as expected - they're parallel-restricted). 3. **No SubPlans**: Queries without SubPlans have `f_worst_splan = 0.0`. ## Testing The `sql/subplan.sql` regression test demonstrates SubPlan detection: - Creates tables with correlated subquery in JOIN clause - SubPlan executes 380 times (shown in EXPLAIN) - `f_worst_splan` captures the cost burden - Test validates the metric appears in output Run the test: ```bash make installcheck REGRESS=subplan ``` ## Future Enhancements Potential improvements to SubPlan tracking: 1. **Track SubPlan depth**: Nested SubPlans (SubPlans within SubPlans) 2. **Separate InitPlan stats**: Different metrics for once-executed InitPlans 3. **Per-SubPlan breakdown**: Track multiple SubPlans individually, not just worst 4. **Correlation type**: Distinguish EXISTS, ANY, ALL, scalar subqueries 5. **Execution time**: Track actual SubPlan execution time, not just cost estimate ## References - PostgreSQL SubPlan documentation: https://www.postgresql.org/docs/current/using-explain.html - Correlated subquery optimization: https://wiki.postgresql.org/wiki/Subquery_scan_removal - Parallel query limitations: https://www.postgresql.org/docs/current/parallel-safety.html