# ProvSQL TODO

Planning material for upcoming ProvSQL work, kept alongside the source
tree so the plans evolve with the code that implements them.

Each plan document follows a consistent layout:

1. **Intro** : one paragraph stating the scope of the plan and the
   reference material it is anchored on.
2. **Out of scope** (optional) : items deliberately excluded, with a
   pointer to where they are handled instead.
3. **Plan** : the proposals themselves, each self-contained.
4. **Priorities** : ship-when ordering.
5. **Implementation observations** (optional) : reusable notes from
   prior work in the same area.

## Contents

- [`bounded-treewidth-data.md`](bounded-treewidth-data.md) :
  feasibility study for exploiting bounded treewidth of the input data
  (Courcelle's theorem and its provenance refinement, ABS 2015 / 2017).
  Open: extend independent-product factoring for the relational
  pathologies (threshold / separator recognition), a decomposition-aligned
  (cycluit) construction for recursive reachability, the full
  data-decomposition + tree-automaton pipeline, and a treewidth-aware
  general m-semiring evaluator.
- [`conditioning.md`](conditioning.md) : plan for a conditioning
  primitive, unifying discrete tuple-correlation (MarkoViews, Jha &
  Suciu PVLDB 2012) and continuous random variables as one operation at
  two carriers -- the `gate_conditioned` design, conditional `P(Q|C)`, a
  conditioned distribution that flows onward, arbitrary denial
  constraints, Shapley over evidence, and soft/weighted conditioning.
- [`case-studies.md`](case-studies.md) : plan for closing the
  feature-coverage gaps in the user tutorial and the existing case
  studies by extending CS1-CS5, plus a future UDF / aggregate-join
  study (CS8).
- [`continuous_distributions.md`](continuous_distributions.md) : roadmap
  for extending the continuous random-variable surface beyond the shipped
  Normal/Uniform/Exponential/Erlang/Categorical/Mixture baseline (further
  parametric families, quantiles / function application / order
  statistics, empirical & structural distributions, conditioning, copulas).
- [`probability-evaluation.md`](probability-evaluation.md) : the
  **remaining** probability-method-selection work, atop the now-landed
  method catalog + three-path (exact / relative / additive) chooser:
  the d-tree's remaining pieces (BID / multivalued circuits, non-DNF
  exact auto-selection, memoising the approximate path), the SUM-safe
  rounding FPTRAS, the exact residual HAVING shapes (branch-spanning SUM,
  BID disjoint blocks, UNION/EXCEPT over a shared-tuple join), catalog
  follow-ups (lazy Boolean build, guarantee propagation, independence-cert
  cache), RV-probability transparency, and d-tree research polish. Borders
  [`safe-query-followups.md`](safe-query-followups.md).
- [`safe-query-followups.md`](safe-query-followups.md) : deferred ideas
  bordering the `provsql.boolean_provenance` work -- further Boolean-only
  optimisations (independent-subtree detection, Möbius / Monet…), the
  inversion-free `UCQ(OBDD)` extensions (UNION in a view, FD-aware orders),
  the planner-side `CERT_SAFE_AGG_PLAN` certificate for BID-disjoint
  HAVING, discrete `random_variable` extensions, and the
  hierarchical-detector follow-ups (FD-induced nested rewrite, soft keys,
  view-descent FD chases, data-safe plans).
- [`scalar-subqueries.md`](scalar-subqueries.md) : the remaining
  unsupported scalar-/correlated-subquery forms -- scalar sublinks nested
  in arithmetic (today a passthrough-with-warning; the decorrelation
  follow-up now has its `agg_token`-arithmetic prerequisite in place),
  different-`(Q, corr)` multi-sublinks, and `GROUP BY` bodies.
- [`studio.md`](studio.md) : open ProvSQL Studio work -- the
  Contributions (Shapley / Banzhaf heat-map) and Time-travel / Temporal
  modes, batch result-table evaluation, multi-user demo deployment,
  and Notebook-mode polish (collapse / clear output, run-from-here,
  per-cell row cap).