Continuous Distributions
========================

This page describes the architecture of ProvSQL's continuous
random-variable surface: the on-disk gates, the SQL composite
type, the planner-hook rewriter's classifier, the Monte Carlo
sampler, the RangeCheck / AnalyticEvaluator / Expectation chain,
the HybridEvaluator's simplifier and island decomposer, the
conditional-inference path, the aggregate dispatch, and the
Studio rendering hooks. The user-facing description lives in
:doc:`../user/continuous-distributions`.

Gate Types
----------

Three gate types are appended to the :cfunc:`gate_type` enum in
:cfile:`provsql_utils.h`, before the ``gate_invalid``
sentinel, with no renumbering of the existing values. The
companion ``provenance_gate`` SQL enum in
``sql/provsql.common.sql`` mirrors the C enum identically.

``gate_rv``
    Random-variable leaf. The gate's ``extra`` blob carries the
    distribution kind and parameters as text, parsed at load
    time:

    - ``"normal:μ,σ"`` for ``Normal(μ, σ)``;
    - ``"uniform:a,b"`` for ``Uniform[a, b]``;
    - ``"exponential:λ"`` for ``Exponential(λ)``;
    - ``"erlang:k,λ"`` for ``Erlang(k, λ)``.

    Categorical random variables share no ``gate_rv`` encoding;
    they are encoded as a block of ``gate_mulinput`` gates
    under a ``gate_mixture`` (see below).

``gate_arith``
    ``N``-ary arithmetic over scalar children. The operator tag
    lives in ``info1`` of the gate's ``GateInformation``:
    ``provsql_arith_op`` is ``PLUS = 0``, ``TIMES = 1``,
    ``MINUS = 2``, ``DIV = 3``, ``NEG = 4``. The enum is
    append-only: the values are persisted on disk and must not
    be renumbered.

``gate_mixture``
    Probabilistic mixture. The wire vector is
    ``[p, x, y]`` for a Bernoulli mixture (with ``p`` a Boolean
    gate and ``x``, ``y`` scalar RV roots) or
    ``[key, mul_1, …, mul_n]`` for a categorical block (with
    ``key`` a fresh ``gate_input`` anchoring the block and
    each ``mul_i`` a ``gate_mulinput`` carrying the
    outcome value in its ``extra`` and its probability via
    :sqlfunc:`set_prob`).

In addition, ``gate_value`` gains a *float8 mode*: the
``extra`` blob is parsed via ``extract_constant_double``
(:cfile:`having_semantics.cpp`) rather than the existing
``extract_constant_C`` integer-only path. Both paths coexist
so ``gate_value`` covers both the deterministic-numeric
mode used in HAVING sub-circuits and the
random-variable-constant mode used by :sqlfunc:`as_random`.

SQL Surface
-----------

The type ``random_variable`` is a thin wrapper around the UUID of
the provenance gate behind the variable: the UUID is the single
source of truth, and every downstream evaluator (``MonteCarloSampler``,
``AnalyticEvaluator``, ``Expectation``, ``RangeCheck``,
``HybridEvaluator``) dispatches on the gate it points at, parsing
the distribution from the gate's ``extra`` blob. Its IO functions
live in :cfile:`random_variable_type.c`; the C++-side helpers
(:cfunc:`parse_distribution_spec`, :cfunc:`analytical_mean`
and the matching ``analytical_variance`` /
``analytical_raw_moment`` overloads) live in
:cfile:`RandomVariable.cpp` and are the parsers consumed by
every downstream evaluator.

Constructors are PL/pgSQL functions in
``sql/provsql.common.sql``:
:sqlfunc:`normal`, :sqlfunc:`uniform`,
:sqlfunc:`exponential`, :sqlfunc:`erlang`,
:sqlfunc:`categorical`,
:sqlfunc:`mixture` (two overloads), and
:sqlfunc:`as_random` (three numeric overloads via the
``double precision`` form). They validate parameters and mint
the appropriate gate via :sqlfunc:`create_gate`,
``set_extra``, :sqlfunc:`set_prob`,
``set_infos``. The fresh-randomness constructors
(:sqlfunc:`normal`, :sqlfunc:`uniform`, :sqlfunc:`exponential`,
:sqlfunc:`erlang`, :sqlfunc:`categorical`, and the n-array
:sqlfunc:`mixture` overload that mints an anonymous gate) are
``VOLATILE`` to prevent constant-folding under ``STABLE`` or
``IMMUTABLE`` from collapsing two independent draws into a
single shared gate. The deterministic constructors
(:sqlfunc:`as_random` and the three-argument
``mixture(p uuid, x random_variable, y random_variable)``
overload, both of which mint a v5-derived UUID keyed on their
inputs) are ``IMMUTABLE``.

Arithmetic operators ``+ - * / -`` on
``(random_variable, random_variable)`` resolve to
``random_variable_plus`` and siblings, each a
one-line SQL function that calls ``provenance_arith``
with the appropriate ``provsql_arith_op`` tag. Comparison
operators ``< <= = <> >= >`` resolve to placeholder procedures
that *raise* if executed; the planner hook intercepts every such
``OpExpr`` and rewrites it before the executor sees it (see the
classifier section below).

Implicit casts ``integer → random_variable``,
``numeric → random_variable``, ``double precision →
random_variable`` are declared explicitly so that
``WHERE rv > 2`` and ``WHERE 2.5 > rv`` resolve uniformly via the
``(rv, rv)`` operator declarations.

Planner-Hook Rewriting
----------------------

The transformation that lifts ``WHERE`` and join predicates on
``random_variable`` columns into the row's provenance circuit
lives in the same :cfunc:`provsql_planner` hook in
:cfile:`provsql.c` that handles deterministic
provenance tracking and the :cfunc:`agg_token` HAVING surface.

The central walker is :cfunc:`migrate_probabilistic_quals`. It
walks every qual in the input query and routes each into one of
four mutually-exclusive classes (the ``qual_class`` enum):

- ``QUAL_PURE_AGG``: the qual is built only from ``agg_token``
  comparators (the pre-existing HAVING pathway).
- ``QUAL_PURE_RV``: the qual is built only from
  ``random_variable`` comparators.
- ``QUAL_DETERMINISTIC``: the qual contains no probabilistic
  comparator and stays in the WHERE clause as ordinary SQL.
- A short tail of *mixed-error* classes flagged so the rewriter
  raises a clean diagnostic rather than producing a malformed
  circuit (e.g. a qual that conjoins a ``random_variable``
  comparator and an ``agg_token`` comparator in the same node).

For ``QUAL_PURE_RV`` quals, the rewriter mints a
``gate_cmp`` per comparator and conjoins its UUID into the
row's provenance via ``provenance_times``. The
comparator's float8-comparator OID is recovered via
``random_variable_cmp_oid``. The original
``OpExpr`` is dropped from the WHERE so the executor never reaches
the placeholder procedure.

For ``QUAL_PURE_AGG`` quals, the existing HAVING pathway
(:cfunc:`make_aggregation_expression`, dispatched on
``aggtype``) is reused with one extension: when the aggregate's
result type is ``OID_TYPE_RANDOM_VARIABLE``, the rewriter routes
through ``make_rv_aggregate_expression`` so the per-row
argument is wrapped in ``rv_aggregate_semimod`` (a
:sqlfunc:`mixture` over the row's provenance and the
identity for the aggregate, see *Aggregate Dispatch* below).

A short-cut handles the corner case of ``WHERE rv > 2`` on a
query that touches no provenance-tracked relation: there is
nothing to conjoin into, so the rewriter synthesises a
single-row FROM-less SELECT to host the gate_cmp, and the
result is a circuit that :sqlfunc:`probability_evaluate` reads
directly.

Monte Carlo Sampler
-------------------

The sampler implementation lives in
:cfile:`MonteCarloSampler.cpp`. The entry point
:cfunc:`monteCarloRV` runs ``N`` iterations over a
:cfunc:`GenericCircuit`; per iteration it draws every reachable
``gate_rv`` leaf once (memoised in ``rv_cache_``) and
evaluates every reachable ``gate_input`` Bernoulli once
(memoised in ``bool_cache_``). The two caches ensure that
shared leaves are correctly coupled within an iteration.

The RNG is ``std::mt19937_64``, seeded from the
``provsql.monte_carlo_seed`` GUC: ``-1`` seeds from
``std::random_device``; any other value (including ``0``) is
used as a literal seed for reproducibility. The same RNG drives
the Bernoulli and continuous (gate_rv) sampling paths, so a
pinned seed reproduces both the discrete and continuous
components of a circuit's randomness.

``Sampler::evalScalar`` is the scalar dispatcher: it knows
how to sample ``gate_rv``, ``gate_value`` (float8 mode
parsed via ``extract_constant_double``),
``gate_arith`` (recursing on children and combining per
``info1``), and ``gate_mixture`` (sampling the Boolean
selector once via ``evalBool``, then recursing into the
chosen branch). The ``gate_agg`` arm calls back into the
aggregate evaluator with the per-iteration sampled values; this
is what unlocks HAVING+RV under Monte Carlo.

``Sampler::evalBool`` is the Boolean dispatcher: it walks the
Boolean wrappers (plus / times / monus / cmp / input / mulinput /
project / eq), and treats ``gate_delta`` as transparent: the
gate exists for the structural δ-semiring algebra but adds
no event to the rv_* event walker. The same transparency is
asserted in ``walkAndConjunctIntervals`` so the AND-conjunct
pass that backs RangeCheck behaves consistently.

RangeCheck
----------

:cfile:`RangeCheck.cpp` propagates support intervals
through ``gate_arith`` and tests every ``gate_cmp``
against the propagated interval. A comparator that is decidable
from the support alone (e.g. a Normal restricted to
``x > μ + 10σ``: the support of the LHS is
:math:`(-\infty, +\infty)` but every realisation is
overwhelmingly to the right of the RHS; or a bounded uniform
``x > b``: the support cap is ``b``, so the cmp is identically
false) collapses to a Bernoulli ``gate_input`` with
probability ``0`` or ``1``, transparent to every downstream
consumer.

The AND-conjunction pass ``walkAndConjunctIntervals`` walks a
``WHERE`` clause's conjunction and intersects the per-RV intervals
across conjuncts before running the per-comparator decision.
``reading > 1 AND reading < 3`` thus constrains a single normal
once, with the analytic CDF call evaluating both endpoints in a
single pass.

``compute_support`` is exposed as ``rv_support`` for
SQL-side use (:sqlfunc:`support` polymorphically dispatches on
type and routes ``random_variable`` here).

AnalyticEvaluator
-----------------

:cfile:`AnalyticEvaluator.cpp` computes the exact
probability of a ``gate_cmp`` whose two children are a
single-distribution scalar and a constant (or two
single-distribution scalars whose joint distribution is
analytically tractable, e.g. two independent normals via
``X − Y ∼ Normal(μ_X − μ_Y, σ_X² + σ_Y²)``).

The kernel is ``std::erf`` for the standard-normal CDF;
``std::log1p`` / ``std::expm1`` for the exponential CDF;
linear arithmetic for the uniform CDF; the regularised lower
incomplete gamma for the Erlang CDF. Equality and inequality on
continuous distributions collapse correctly: ``X = X`` is
identically true (handled in ``RangeCheck`` as a
zero-width interval identity), ``X = Y`` for any two sub-circuits
of which at least one has a continuous distribution is identically
false. ``hasOnlyContinuousSupport`` (in :cfile:`RangeCheck.cpp`)
is the predicate
behind the second case: a recursive walk that returns true on
``gate_rv`` leaves, on ``gate_arith`` whose every wire is
continuous, and on Bernoulli mixtures whose two branches are both
continuous; false on ``gate_value`` (Dirac), on categorical
mixtures (point masses at each outcome), and on Boolean / agg
gates. The widened test catches heterogeneous-rate exponential
sums (``Exp(λ_1) + Exp(λ_2)`` with ``λ_1 ≠ λ_2``, no Erlang
closure), products of two independent continuous RVs, and mixed
``gate_arith`` composites that the simplifier cannot fold to a
single ``gate_rv`` -- their equality predicate would otherwise
have flowed all the way down to the MC marginalisation only to
return 0 in finite precision anyway.

When neither side is purely continuous, a second analytical path
in ``RangeCheck`` fires: ``collectDiracMassMap`` extracts the
``(value → mass)`` map from each side (recursing into
categoricals and Bernoulli mixtures of ``as_random`` /
``gate_value`` branches), and the cmp resolves exactly via the
independent-Dirac sum-product

  ``P(X = Y) = Σ_{v ∈ M_X ∩ M_Y} M_X(v) · M_Y(v)``.

Continuous components on either side contribute zero by
measure-zero arguments (continuous vs Dirac and continuous vs
continuous), so they need not appear in the sum. Independence is
required for the factoring to hold; ``collectRandomLeaves`` walks
both sides for the union of ``gate_rv`` + ``gate_input`` leaves
and the shortcut bails on any overlap (e.g. two mixtures sharing
a Bernoulli ``p_token``). Bernoulli mixtures whose ``p_token`` is
a compound Boolean (whose static probability would require a
recursive :sqlfunc:`probability_evaluate` call) also bail. The
sum-product subsumes the disjoint-Dirac case as its boundary
(empty intersection ⇒ ``P(X = Y) = 0``).

Analytical Moment Evaluator
---------------------------

:cfile:`Expectation.cpp` implements the closed-form moment
evaluator for continuous-RV circuits. It is *not* a
``Semiring`` subclass: the
:cfunc:`provenance_evaluate_compiled_internal` dispatcher
special-cases ``semiring == "expectation"`` and calls
:cfunc:`compute_expectation` directly on the
``GenericCircuit``, bypassing the template-based
``GenericCircuit::evaluate<S>`` machinery used by the proper
semirings. The same entry point is reached by
:sqlfunc:`expected` over a ``random_variable`` and by the
``rv_moment`` C helper.

The algorithm runs analytical moment computation per distribution
at leaves, then propagates through ``gate_arith`` by closed-form
rules:

- ``E[X + Y] = E[X] + E[Y]`` (always),
- ``E[X − Y] = E[X] − E[Y]`` (always),
- ``E[a · X] = a · E[X]`` (when one operand is a constant),
- ``E[X · Y] = E[X] · E[Y]`` (only when ``X`` and ``Y`` are
  structurally independent),
- ``Var[X + Y] = Var[X] + Var[Y]`` (independent),
- etc.

Structural independence is detected via a per-evaluation
``FootprintCache`` that memoises, per gate, the set of base
``gate_rv`` leaves reachable from it. Two gates whose
footprints are disjoint are independent; the cache speeds up the
check from quadratic to linear by sharing the leaf-set
computation across the recursion.

When no closed form applies, :cfunc:`compute_expectation` falls
back to a Monte-Carlo estimate using ``MonteCarloSampler``. The
sample count is ``provsql.rv_mc_samples``; setting it to ``0``
turns the fallback into an exception so callers that need
analytical answers can detect the silent fallback.

HybridEvaluator
---------------

:cfile:`HybridEvaluator.cpp` is the orchestrator. Given a
circuit, it runs:

1. **Universal peephole pass** (:cfunc:`runHybridSimplifier`)
   that folds family-preserving combinations into a single leaf:
   linear combinations of independent normals (``a·X + b·Y + c``
   into a single normal), sums of i.i.d. exponentials with the
   same rate (into an Erlang), affine combinations of a single
   uniform (``a·U(p, q) + c`` into ``U(a·p + c, a·q + c)`` with
   bounds reordered when ``a < 0``), closed-form negation of a
   bare Normal or Uniform (``-N(μ, σ)`` into ``N(-μ, σ)``,
   ``-U(a, b)`` into ``U(-b, -a)``), MINUS-to-PLUS canonicalisation
   so subtraction shapes flow through the same PLUS pipeline,
   DIV-to-TIMES canonicalisation for division by a constant
   (``X / c`` rewritten as ``(1/c) · X`` so the existing
   normal-family and uniform-family scaling rules apply),
   shift-and-scale of a single normal through mixtures and
   categoricals, single-child arith roots, semiring-identity drops
   (``gate_one`` in TIMES, ``gate_zero`` in PLUS, …). The pass is
   invariant-preserving: every transformation produces a
   semantically equivalent circuit. Out of scope: subtraction
   between two distinct continuous RVs of the same family
   (``U - U`` is triangular, not uniform), shifting an exponential
   or Erlang (``Exp + c`` has the wrong support), and negating an
   exponential or Erlang (``-Exp`` flips the support to
   ``(-∞, 0]``); these shapes stay as ``gate_arith`` and the MC
   sampler handles them per-iteration.

   *Peephole* is borrowed from compiler engineering (McKeeman,
   CACM 8(7), 1965): a small sliding window over consecutive
   instructions / gates, a fixed list of local pattern ->
   replacement rules, iterated to a fixed point. Each rule here
   looks at one ``gate_arith`` plus its immediate children,
   never further, matching the original scope. Contrast with
   ``RangeCheck`` (:cfile:`RangeCheck.cpp`), which propagates a
   data-flow fact (the support interval) through the whole
   circuit, and with the island decomposer below, which uses a
   global union-find over base-RV footprints.

2. **Island decomposition** (:cfunc:`runHybridDecomposer`)
   that splits a multi-cmp query into independent islands
   (connected components on a union-find over base-RV
   footprints). A single-cmp island marginalises to a Bernoulli
   ``gate_input`` via ``AnalyticEvaluator``. A multi-cmp
   island whose cmps share base RVs is enumerated via the joint
   table (the joint distribution of the shared base RVs is
   evaluated explicitly).

3. **Monotone-shared-scalar fast path** for the common shape of
   a single ``gate_rv`` shared across multiple monotone
   comparators (typical of range queries on a single column):
   the joint event reduces to an interval on the underlying
   scalar and one analytical CDF call per endpoint.

4. **Universal semiring-identity collapse** after RangeCheck
   has decided every decidable cmp.

The simplifier is gated by ``provsql.simplify_on_load`` for the
universal pass run at load time, and by the debug-only
``provsql.hybrid_evaluation`` (``GUC_NO_SHOW_ALL``) for the
in-evaluator hybrid path. End users have no reason to flip
``hybrid_evaluation``; it exists for developer A/B against the
unfolded path and as a bisection knob.

Conditional Evaluation
----------------------

:sqlfunc:`expected`, :sqlfunc:`variance`, :sqlfunc:`moment`,
:sqlfunc:`central_moment`, :sqlfunc:`support`,
:sqlfunc:`rv_sample`, and :sqlfunc:`rv_histogram` all accept an
optional ``prov uuid DEFAULT gate_one()`` argument. When ``prov``
resolves to anything other than ``gate_one()``, evaluation routes
through the joint-circuit loader.

``getJointCircuit`` (:cfile:`MMappedCircuit.cpp`)
builds a multi-rooted BFS over the union of the reachable gates
from both ``input`` and ``prov`` so shared ``gate_rv``
leaves between the two are loaded into a single
:cfunc:`GenericCircuit` and consequently couple correctly in the
Monte Carlo sampler's ``rv_cache_``. This is the *shared-atom
coupling invariant*: a conditioning event ``prov`` is only
meaningful relative to the random variables it references, and
those must be the same leaves the moment's evaluator sees.

The closed-form table is exhaustive for the single-distribution
shapes:

- **Normal**, truncated to any one-sided or two-sided interval,
  via the Mills-ratio formula and integration by parts.
- **Uniform** on the intersection of the support and the
  conditioning interval (mean and variance trivial).
- **Exponential** by memorylessness on a lower bound, plus
  truncation to a finite interval via the lower incomplete gamma.

For shapes outside the closed-form table, the conditional moment
is estimated by rejection sampling at ``provsql.rv_mc_samples``;
:sqlfunc:`rv_sample` emits a ``NOTICE`` (and
:sqlfunc:`rv_histogram` / :sqlfunc:`expected` raise) when the
acceptance rate drops below the requested ``n`` within the
budget, so the caller can either widen the budget or loosen the
conditioning.

``matchTruncatedSingleRv`` (in :cfile:`RangeCheck.cpp`) is the
single-RV shape-detection helper used by the moment surface
(``try_truncated_closed_form`` in :cfile:`Expectation.cpp`) and
the rejection-free sampler
(:cfunc:`try_truncated_closed_form_sample`). It
runs the four common gates -- ``gate_rv`` root check,
``parse_distribution_spec``, ``collectRvConstraints``, and the
empty-intersection / ``gate_zero`` event guards -- so the
supported-shape set stays in sync between moments and sampling
and adding a fifth (e.g. truncated Erlang via the regularised
incomplete gamma) only touches one detection site.

``matchClosedFormDistribution`` (same file) generalises the
single-RV matcher to the four-arm variant
``std::variant<TruncatedSingleRv, DiracShape, CategoricalShape,
BernoulliMixtureShape>`` consumed by
:sqlfunc:`rv_analytical_curves`. The variant covers, in addition
to the bare-RV case, ``as_random(c)`` Diracs (``gate_value``
roots), categorical-form ``gate_mixture`` roots, and classic
Bernoulli ``gate_mixture`` roots over any recursively-matched
shape. Conditioning is honoured uniformly across all four arms:
non-trivial events are routed through ``collectRvConstraints``
to extract a ``[lo, hi]`` interval on the root variable, then
``truncateShape`` is applied recursively -- bare RVs intersect
their bounds and renormalise via the truncated CDF; Diracs are
kept iff the value falls inside the interval (otherwise the
event is infeasible); categoricals drop outcomes outside the
interval and renormalise surviving masses; Bernoulli mixtures
recursively truncate their arms and reweight the Bernoulli by
the ratio of arm masses
(:math:`\pi' = \pi Z_L / (\pi Z_L + (1-\pi) Z_R)`), with the
arm masses computed by ``shape_mass`` (a parallel recursive
pass that integrates the unconditional CDF over the interval).
An arm with zero post-truncation mass is eliminated and the
mixture degenerates to the surviving arm.

``eventIsProvablyInfeasible`` (also :cfile:`RangeCheck.cpp`) is
the conditional-moment dispatcher's pre-MC short-circuit:
``gate_zero`` events are detected for every root type, and
``gate_rv`` roots additionally surface
``collectRvConstraints``-empty intersections. The dispatcher in
``conditional_raw_moment`` / ``conditional_central_moment``
(:cfile:`Expectation.cpp`) calls it after
``try_truncated_closed_form`` returns ``nullopt`` and raises a
"conditioning event is infeasible" error directly when the
predicate fires, avoiding a 100 000-sample MC round whose
acceptance probability is exactly zero.

:sqlfunc:`rv_sample` and :sqlfunc:`rv_histogram` share
``MonteCarloSampler::try_truncated_closed_form_sample``: a
direct exact-sampling fast path that fires on the same shape as
the moment surface (bare ``gate_rv`` of Uniform / Normal /
Exponential with an interval-extractable event). Uniform draws
``U(lo, hi)`` on the intersected truncation; Exponential
one-sided uses memorylessness (``X | X > c = c + Exp(λ)``),
two-sided uses inverse-CDF via ``std::log1p`` / ``std::expm1``
for numerical accuracy near the support boundary; Normal uses
inverse-CDF transform with ``std::erf`` for the forward CDF and
the Beasley-Springer-Moro rational approximation for the
inverse. The fast path delivers exactly ``n`` samples with 100%
acceptance even for tight tail events that previously starved
the rejection budget. Erlang truncation and ``gate_arith``
composite roots fall through to MC rejection unchanged.

:sqlfunc:`rv_analytical_curves` (in :cfile:`RvAnalyticalCurves.cpp`)
exposes the closed-form PDF, CDF, and discrete stems as sampled
data for ProvSQL Studio's Distribution profile overlay. Returns
NULL when the root sub-circuit is not a closed-form shape, so
callers can dispatch it in parallel with :sqlfunc:`rv_histogram`
without a structural pre-check. The payload has three optional
fields:

* ``pdf`` -- evenly-spaced ``{x, p}`` samples of the continuous
  density. Absent when the shape has no continuous component
  (pure Dirac, pure categorical, or nested mixture of those).
* ``cdf`` -- same grid as ``pdf``, cumulative probability.
  Always emitted (well-defined for any supported shape; a pure-
  discrete shape produces a staircase, a continuous shape a
  smooth curve, and a mixed shape a smooth curve with jumps at
  the stem positions).
* ``stems`` -- ``{x, p}`` point masses produced by Dirac roots,
  categorical roots, or Dirac / categorical arms inside a
  Bernoulli mixture. Bernoulli weights propagate down the path
  (a Dirac inside ``mixture(0.3, X, c)`` appears at ``(c, 0.7)``).

The supported shape set is the union of
``matchClosedFormDistribution`` 's variant arms (see above):
bare ``gate_rv`` of Normal / Uniform / Exponential /
integer-shape Erlang, ``as_random(c)`` Diracs, categorical
mixtures, and Bernoulli mixtures over any recursively-matched
shape -- all four arms accept a non-trivial conditioning event.
``gate_arith`` composites and non-integer Erlang shapes return
NULL; the frontend renders histogram-only in those cases.

Before matching, :sqlfunc:`rv_analytical_curves` runs
:cfunc:`runHybridSimplifier` so the curves see the same folded tree
that :sqlfunc:`simplified_circuit_subgraph` exposes to Studio's circuit
view: ``c·Exp(λ)`` folded to ``Exp(λ/c)``, sums of independent
normals folded to a single normal, ``c + mixture(p, X, Y)``
pushed inside the mixture, etc. Without this pass a circuit
that displays as a single ``Exp(0.5)`` node would still be
seen by the matcher as a ``gate_arith`` composite of
``value(2)`` and ``gate_rv:Exp(1)`` and would silently fall
back to histogram-only.

Truncation under a bare RV normalises the PDF by
:math:`Z = \text{CDF}(\text{hi}) - \text{CDF}(\text{lo})` and
rescales the CDF to ``[0, 1]`` over the conditioning interval.
Under a Bernoulli mixture the truncated PDF is
:math:`f_{M|A}(x) = (\pi \cdot Z_L \cdot f_{L|A}(x) +
(1-\pi) \cdot Z_R \cdot f_{R|A}(x)) / (\pi Z_L + (1-\pi) Z_R)`
with the per-arm normalisers
:math:`Z_L, Z_R` computed by ``shape_mass``. Under a categorical
the conditional masses are :math:`p_i \cdot
\mathbb{1}\{v_i \in A\} / \sum_j p_j \cdot \mathbb{1}\{v_j \in A\}`.
A Dirac is invariant under any feasible event.

The load-time pass ``runConstantFold`` (in ``HybridEvaluator.cpp``,
invoked from :cfunc:`CircuitFromMMap::applyLoadTimeSimplification`
alongside ``runRangeCheck`` and ``foldSemiringIdentities``)
folds deterministic ``gate_arith`` subtrees to ``gate_value``
at load time. This lifts the common parser shape
``arith(NEG, value:c)`` (produced when SQL parses
``-c::random_variable`` as ``-(c::random_variable)``) into a
clean ``value:-c``, so ``asRvVsConstCmp`` and friends recognise
the comparator's constant side without callers having to
parenthesise. The pass runs only the constant-fold rule from the
hybrid simplifier, never the family closures or identity drops,
because the result is always a ``gate_value`` that carries no
random identity, so no shared-RV coupling is decoupled by the
rewrite. The family closures stay behind the separate
``provsql.hybrid_evaluation`` GUC, which gates :cfunc:`runHybridSimplifier`
inside the probability and view paths where the simplifier owns
the rewritten subtree.

Aggregate Dispatch
------------------

The :sqlfunc:`sum`, :sqlfunc:`avg`, and
:sqlfunc:`product` aggregates over ``random_variable``
all share ``sum_rv_sfunc`` as their state-transition
function (a ``uuid[]`` accumulator) and an
``INITCOND = '{}'`` so the FFUNC runs even on an empty group.
The per-aggregate FFUNCs differ in how they fold the accumulated
UUIDs into a final ``random_variable`` root:

- ``sum_rv_ffunc`` builds a single
  ``gate_arith`` ``PLUS`` over the per-row mixtures (or
  returns the single child for a singleton group, or
  :sqlfunc:`as_random` ``(0)`` for an empty group).
- ``avg_rv_ffunc`` constructs a parallel denominator
  array: for each mixture in the state, it pulls out the
  per-row provenance gate (the mixture's first child) and builds
  a matching ``mixture(prov_i, as_random(1), as_random(0))`` so
  the denominator sums to the count of *included* rows, then
  divides via a ``gate_arith`` ``DIV``. Empty group returns
  SQL ``NULL`` to match standard ``AVG``.
- ``product_rv_ffunc`` re-walks each mixture in the state
  and patches its else-branch from :sqlfunc:`as_random`
  ``(0)`` to :sqlfunc:`as_random` ``(1)`` (going through
  :sqlfunc:`mixture` rather than
  :sqlfunc:`create_gate` so the v5 hash of mixtures
  sharing the same ``(prov_i, X_i, as_random(1))`` triple
  collides correctly), then builds a
  ``gate_arith`` ``TIMES`` root.

The planner hook routes RV-returning aggregates through
``make_rv_aggregate_expression`` so each per-row argument
becomes :sqlfunc:`mixture` ``(prov_i, X_i,
provsql.as_random(0))`` *before* the SFUNC sees it; the
:sqlfunc:`avg` and :sqlfunc:`product` FFUNCs unpack this shape
to recover ``prov_i``. The dispatch is keyed on ``aggtype``
(the aggregate's result type OID) rather than ``aggfnoid`` so
the same routing works for any future RV-returning aggregate.

An earlier design considered an *M-polymorphic ``gate_agg``*
that would carry the full semimodule lift directly. We rejected
it because the mixture-of-mixtures shape composes through every
existing gate_arith / gate_mixture rule, while a new
M-polymorphic gate would have required a parallel evaluation
path in every analytical evaluator. The semimodule-of-mixtures
shape reuses what's there.

Studio Rendering
----------------

ProvSQL Studio's circuit canvas is class-based: every node
carries a ``node--<type>`` CSS class derived from the gate kind,
and the stylesheet at ``studio/provsql_studio/static/app.css``
gives each class its colour, glyph, and inline-text layout.
Adding the three new gate types meant:

- one entry per type in the server-side JSON serialiser at
  ``studio/provsql_studio/circuit.py`` (the ``label`` and
  ``children`` fields, plus the distribution-blob parsing for
  the per-leaf inline glyph);
- one branch in the client renderer at
  ``studio/provsql_studio/static/circuit.js`` to pick the
  ``node--rv`` / ``node--arith`` / ``node--mixture`` class and
  to emit the correct edge labels (``p`` / ``x`` / ``y`` for
  mixtures, the ``provsql_arith_op`` glyph for arith);
- a Circuit-mode fetch that consumes the
  :sqlfunc:`simplified_circuit_subgraph` SRF (a thin
  ``compute_simplified_circuit_subgraph`` wrapper that runs
  the universal peephole pass on a sub-BFS and returns the
  result as ``jsonb``), with the ``provsql.simplify_on_load``
  Config-panel toggle switching between the raw and folded
  views.

The eval-strip *Distribution profile*, *Sample*, *Moment*, and
*Support* entries call :sqlfunc:`rv_histogram`,
:sqlfunc:`rv_sample`, ``rv_moment`` and
``rv_support`` directly. The *Condition on* row-prov
auto-preset is a client-side feature: clicking a result cell
stamps the row's provenance UUID into the input and toggles the
*Conditioned by* badge active. Manual edits stick within a row;
row navigation resets the input to the new row's prov.