# Documentation TODO: case-study coverage `feature-coverage.md` (in this directory) lists user-guide features that are not exercised by the tutorial or any of the five existing case studies. This file plans how to close that gap, by extending CS1-CS5 where the existing scenario fits, and by sketching a new CS6 around upcoming ProvSQL features. ## Out of scope The following features are documented elsewhere and do not need case-study real estate: - `provsql.active`, `provsql.verbose_level`: covered by `user/configuration.rst` and `dev/debugging.rst`. - `get_infos`, `get_extra`, `get_prob`: covered by the SQL API reference. ## Plan ### CS1: Intelligence Agency - Add the `'independent'` and `'weightmc'` probability methods to the benchmark in Step 13. The step already compares possible-worlds, monte-carlo, tree-decomposition, and compilation/d4; `'independent'` is a useful teaching point because it succeeds or errors depending on circuit shape. ### CS2: Open Science Database - `COUNT(DISTINCT study)` and `string_agg(study, ', ')` per (exposure, outcome): a natural addition before or after Step 4 (single-source claims). - `FILTER` clause: `COUNT(*) FILTER (WHERE effect = 'beneficial')` per exposure to rank exposures by net beneficial evidence. - Window functions: rank exposures by reliability-weighted study count with `RANK() OVER (PARTITION BY outcome ORDER BY ...)`. - `UNION ALL`: merge "beneficial" and "harmful" findings into a single signed-effect view, illustrating that ProvSQL combines provenance via ⊕. - `aggregation_evaluate`: extend the evidence-grade semiring (Step 6) to a `GROUP BY outcome` query with a custom semimodule that aggregates per-finding grades into a per-outcome grade. ### CS3: Île-de-France Public Transit - `LATERAL`: for each route, find the next reachable stop with a `LATERAL (SELECT ... LIMIT 1)`. Reads naturally as "what comes after Bagneux on this line". - Window functions: `ROW_NUMBER() OVER (PARTITION BY trip_id ORDER BY stop_sequence)` to enumerate Bagneux's position along each trip. ### CS4: Government Ministers - Add a step that calls `get_valid_time` directly on a single row. Currently only the higher-level `union_tstzintervals`, `timeslice`, `timetravel`, and `history` are exercised. - Demonstrate `UPDATE` explicitly: replace the DELETE + INSERT pair in Step 6 with an `UPDATE holds SET ...`, then `undo` it. UPDATE is documented but not shown in any case study. ### CS5: Wildlife Photo Archive - Add `'independent'` to Step 4 (before the `repair_key` step): the naive conjunctive query is independent-shaped, so the explicit method call succeeds and produces the instructive wrong answer that motivates Step 5. - `INSERT ... SELECT` with provenance propagation: materialise a `confident_detections` provenance-tracked table from a high-confidence filter, showing that inserted rows inherit source provenance rather than fresh tokens. ### CS6: Upcoming features (new case study, blocked on implementation) A new case study to be written once the corresponding ProvSQL features land. Targets: - **UDFs**: provenance propagation through user-defined functions. - **Joining on an aggregate value**: each row joined with the per-group aggregate it belongs to, provenance correctly tracked through the aggregate. - **`choose` for UDF uncertainty**: when a UDF returns one of several candidate outputs, `choose` models the alternatives as mutually exclusive in the provenance circuit (analogous to `repair_key`'s role in CS5, but for derived rather than ingested data). ## Priorities 1. **Quick wins on existing case studies** : the single-bullet additions to CS1 and CS3, and the small CS5 additions, can land independently and immediately close coverage gaps in `feature-coverage.md`. 2. **Larger CS2 / CS4 extensions** : CS2 grows by five bullets and CS4 adds a UPDATE / `undo` round-trip. These are the biggest single-CS coverage wins. 3. **CS6** : blocked on the upstream features (UDFs, aggregate joins, `choose`) landing in ProvSQL; revisit when those ship.