{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "name": "provsql-studio",
   "display_name": "ProvSQL (SQL)",
   "language": "sql"
  },
  "language_info": {
   "name": "sql"
  },
  "provsql": {
   "scheme": "semiring",
   "database": "cs1",
   "generated_from": "doc/source/user/casestudy1.rst"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Case Study: Intelligence Agency\n",
    "\n",
    "This case study, largely extending the scenario introduced when ProvSQL was first presented, demonstrates ProvSQL’s custom semiring capability, where-provenance, probability computation with multiple algorithms, and circuit export through a security-classification scenario.\n",
    "\n",
    "## The Scenario\n",
    "\n",
    "An intelligence agency maintains a database of seven employees spread across three cities. Every employee holds a security clearance ranging from *unclassified* to *top secret*. Your tasks:\n",
    "\n",
    "- identify which cities are served by more than one agent,\n",
    "- determine the minimum clearance level needed to infer each result,\n",
    "- find cities with exactly one agent (sensitive: if the city leaks, the sole agent is exposed),\n",
    "- track where in the database each output value originated,\n",
    "- compute the probability that a city remains a single-agent post after accounting for possible-world uncertainty.\n",
    "\n",
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*The following cells set up the database with all the content this notebook requires; run them first, ideally on a fresh database.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TYPE IF EXISTS classification_level CASCADE;\n",
    "CREATE TYPE classification_level AS ENUM (\n",
    "    'unclassified', 'restricted', 'confidential', 'secret', 'top_secret',\n",
    "    'unavailable'\n",
    ");"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS personnel CASCADE;\n",
    "CREATE TABLE personnel (\n",
    "    id SERIAL PRIMARY KEY,\n",
    "    name TEXT NOT NULL,\n",
    "    position TEXT NOT NULL,\n",
    "    city TEXT NOT NULL,\n",
    "    classification classification_level NOT NULL\n",
    ");"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "TRUNCATE personnel;\n",
    "INSERT INTO personnel (name, position, city, classification) VALUES\n",
    "    ('Juma',   'Director',     'Nairobi', 'unclassified'),\n",
    "    ('Paul',   'Janitor',      'Nairobi', 'restricted'),\n",
    "    ('David',  'Analyst',      'Paris',   'confidential'),\n",
    "    ('Ellen',  'Field agent',  'Beijing', 'secret'),\n",
    "    ('Aaheli', 'Double agent', 'Paris',   'top_secret'),\n",
    "    ('Nancy',  'HR',           'Paris',   'restricted'),\n",
    "    ('Jing',   'Analyst',      'Beijing', 'secret');"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This creates:\n",
    "\n",
    "- `classification_level` – an ordered ENUM (`unclassified` \\< `restricted` \\< `confidential` \\< `secret` \\< `top_secret` \\< `unavailable`) where `unavailable` is a sentinel representing the semiring 𝟘 (no derivation possible)\n",
    "- `personnel` – 7 agents with name, position, city, and clearance level\n",
    "\n",
    "## Step 1: Explore the Database\n",
    "\n",
    "Inspect the `personnel` table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT * FROM personnel ORDER BY id;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see seven rows: Juma (Director, Nairobi), Paul (Janitor, Nairobi), David (Analyst, Paris), Ellen (Field agent, Beijing), Aaheli (Double agent, Paris), Nancy (HR, Paris), and Jing (Analyst, Beijing).\n",
    "\n",
    "## Step 2: Enable Provenance and Create a Name Mapping\n",
    "\n",
    "Enable provenance tracking on `personnel` and create a mapping so that provenance tokens can be labelled with agent names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT add_provenance('personnel');\n",
    "DROP TABLE IF EXISTS personnel_name;\n",
    "SELECT create_provenance_mapping('personnel_name', 'personnel', 'name');"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After [`add_provenance`](https://provsql.org/doxygen-sql/html/group__table__management.html#ga33ff696dabb05d813f2c1f914cb97d9a), every row of `personnel` has a unique UUID token in its hidden `provsql` column. The mapping `personnel_name` associates each token with the corresponding agent’s name.\n",
    "\n",
    "## Step 3: Cities Shared by Multiple Agents\n",
    "\n",
    "Which cities have at least two agents? Use a self-join with an `id` inequality to generate all unordered pairs, then apply [`sr_formula`](https://provsql.org/doxygen-sql/html/group__compiled__semirings.html#ga3aad4775805b92307f3ca9e1b9fbd4c5) to see which agents contribute to each city:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT p1.city,\n",
    "       sr_formula(provenance(), 'personnel_name') AS formula\n",
    "FROM personnel p1\n",
    "JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "GROUP BY p1.city\n",
    "ORDER BY p1.city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The formula for Nairobi is `Juma ⊗ Paul`: both agents must be present for Nairobi to appear. Beijing similarly shows `Ellen ⊗ Jing`. Paris, with three agents, shows all three pairwise products joined by `⊕`.\n",
    "\n",
    "## Step 4: Minimum Security Clearance (sr_minmax)\n",
    "\n",
    "For each shared city, what is the *minimum clearance level* required to have inferred that the city has multiple agents? An analyst who knows the city only needs to see the lowest-cleared agent there.\n",
    "\n",
    "This is the security shape of the min-max m-semiring, computed by the compiled function [`sr_minmax`](https://provsql.org/doxygen-sql/html/group__compiled__semirings.html#ga21bc3ee408f01de06e82ff21e7a01458) over the `classification_level` enum:\n",
    "\n",
    "- `⊕` (OR combination) = `min`: to infer *either* agent was involved, you only need clearance for the less-classified one (one witness suffices to establish the disjunction).\n",
    "- `⊗` (AND/join) = `max`: to confirm *both* agents are present, you need clearance for the more-classified one (you must be able to access both records to establish the join).\n",
    "\n",
    "The third argument is a sample value of the carrier enum (used only for type inference); its value is ignored:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS personnel_level;\n",
    "SELECT create_provenance_mapping('personnel_level',\n",
    "                                 'personnel', 'classification');\n",
    "\n",
    "SELECT p1.city,\n",
    "       sr_minmax(provenance(), 'personnel_level',\n",
    "                 'unclassified'::classification_level) AS min_clearance\n",
    "FROM personnel p1\n",
    "JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "GROUP BY p1.city\n",
    "ORDER BY p1.city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Results: Nairobi requires `restricted` (Paul is the more-classified of the two agents, and both must be accessed to confirm the pair). Beijing requires `secret` (both Ellen and Jing hold the same level). Paris requires `confidential`: the pair David–Nancy has MAX clearance `confidential`, which is the lowest maximum among all Paris pairs, so `confidential` clearance suffices to confirm at least one pair.\n",
    "\n",
    "## Step 5: Cities with Exactly One Agent (EXCEPT / Monus)\n",
    "\n",
    "A city with a single agent is sensitive: knowing the city immediately identifies the agent. Find cities where *all* agents are alone using `EXCEPT`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       sr_formula(provenance(), 'personnel_name') AS formula\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "ORDER BY city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:**\n",
    "\n",
    "> ProvSQL’s `EXCEPT` uses the *monus* operator `⊖` of the provenance semiring rather than plain set difference. Every city appears in the result with a provenance formula; the formula evaluates to `𝟘` for cities that are definitely shared and to a non-trivial expression for cities that *could* be single-agent in some possible world. Nairobi’s formula `(Juma ⊕ Paul) ⊖ (Juma ⊗ Paul)` reads: “at least one of Juma or Paul is present, minus the event where both are.”\n",
    "\n",
    "## Step 6: Where-Provenance\n",
    "\n",
    "Where-provenance tracks *which column of which input row* produced each output value. Enable it and re-run the shared-city query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SET provsql.provenance = 'where';\n",
    "\n",
    "SELECT p1.city,\n",
    "       where_provenance(provenance()) AS source\n",
    "FROM personnel p1\n",
    "JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "GROUP BY p1.city\n",
    "ORDER BY p1.city;\n",
    "\n",
    "SET provsql.provenance = 'semiring';"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each city output value is traced back to the `city` column (column 4) of the `personnel` table for every agent in that city. The notation `[personnel:〈token〉:4;...]` shows the token and column index of each contributing row; the trailing `[]` is the untracked `source` column itself.\n",
    "\n",
    "## Step 7: Assign Probabilities\n",
    "\n",
    "Suppose the existence of each agent in the database is uncertain. Assign each agent a probability equal to `id / 10.0`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "ALTER TABLE personnel ADD COLUMN IF NOT EXISTS probability DOUBLE PRECISION;\n",
    "UPDATE personnel SET probability = id / 10.0;\n",
    "\n",
    "DO $$ BEGIN\n",
    "  PERFORM set_prob(provenance(), probability) FROM personnel;\n",
    "END $$;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now Juma has probability 0.1, Paul 0.2, …, Jing 0.7.\n",
    "\n",
    "## Step 8: Probability – Exact\n",
    "\n",
    "Compute the exact probability that each city is a single-agent city:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       ROUND(probability_evaluate(provenance())::numeric, 4) AS prob\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "ORDER BY city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nairobi (agents with probabilities 0.1 and 0.2) has probability `0.1 × 0.8 + 0.9 × 0.2 = 0.26` of having exactly one agent. Beijing (0.4 and 0.7) scores `0.54`. Paris (0.3, 0.5, 0.6) gives `0.41`.\n",
    "\n",
    "## Step 9: Probability – Monte Carlo\n",
    "\n",
    "For larger circuits, exact evaluation can be expensive. Monte Carlo sampling gives an approximate answer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       probability_evaluate(\n",
    "           provenance(), 'monte-carlo', '10000') AS prob\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "ORDER BY city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With 10 000 samples the result is accurate to roughly ±0.01 (see [Margin of error](https://en.wikipedia.org/wiki/Margin_of_error)). Results will vary slightly between runs due to sampling. Compare the runtime against the exact method (`\\timing` in psql, or the per-query timing Studio displays).\n",
    "\n",
    "## Step 10: Probability – Knowledge Compiler\n",
    "\n",
    "**Note:**\n",
    "\n",
    "> This step uses an external knowledge compiler such as `d4` or `dsharp`, which must be on your `PATH` (or in a directory listed in the `provsql.tool_search_path` GUC, see [the configuration chapter](https://provsql.org/docs/user/configuration.html)). The query below guards the call with [`tool_available`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga7be235a742f0afc6ddb56ee2b1c025c1), so where no such compiler is installed – in particular in the Playground, which bundles none – it reports that instead of failing, and you can read on.\n",
    "\n",
    "A knowledge compiler converts the provenance circuit to a *d-DNNF* representation, which enables efficient exact probability evaluation on large circuits of specific forms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       CASE WHEN tool_available('d4')\n",
    "            THEN ROUND(probability_evaluate(\n",
    "                     provenance(), 'compilation', 'd4')::numeric, 4)::text\n",
    "            ELSE '(needs external compiler d4; unavailable here)'\n",
    "       END AS prob\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "ORDER BY city;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare the runtime (`\\timing` in psql, or Studio’s per-query timing) against the possible-worlds and Monte Carlo methods. On this small example, the external knowledge compiler will be slower than the other methods: invoking an external process and compiling the circuit carries significant overhead that only pays off on much larger circuits.\n",
    "\n",
    "## Step 11: Visualise a Provenance Circuit\n",
    "\n",
    "[`view_circuit`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga9c312a8b72b968e3669432ab5ae49928) renders the provenance circuit as an ASCII box-art diagram using [graph-easy](https://metacpan.org/dist/Graph-Easy) (which must be on your `PATH`). The query guards the call with [`tool_available`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga7be235a742f0afc6ddb56ee2b1c025c1), so without `graph-easy` – in particular in the Playground, where Studio’s interactive [circuit mode](https://provsql.org/docs/user/studio.html) is the better view anyway – it says so rather than failing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       CASE WHEN tool_available('graph-easy')\n",
    "            THEN view_circuit(provenance(), 'personnel_name')\n",
    "            ELSE '(needs the graph-easy renderer; use Studio circuit mode)'\n",
    "       END AS circuit\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "WHERE city = 'Nairobi';"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result shows the monus gate at the top, with `Juma` and `Paul` as leaf inputs, rendered in box-art notation in the terminal.\n",
    "\n",
    "## Step 12: Export to XML\n",
    "\n",
    "[`to_provxml`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga60e8ffbac337ba6cb9e32b64672b802f) serialises the provenance circuit to [PROV-XML](https://www.w3.org/TR/prov-xml/), the W3C standard for provenance interchange, which can be processed by any standard XML tool:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city,\n",
    "       to_provxml(provenance(), 'personnel_name') AS provxml\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "WHERE city = 'Nairobi';"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 13: Large Circuit Benchmark\n",
    "\n",
    "To compare the three probability algorithms at scale, create a synthetic 100 × 100 random-probability matrix and enable provenance tracking on it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS matrix CASCADE;\n",
    "CREATE TABLE matrix AS\n",
    "SELECT ones.n + 10 * tens.n AS x,\n",
    "       other.n + 10 * tens2.n AS y,\n",
    "       random() AS prob\n",
    "FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),\n",
    "     (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),\n",
    "     (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) other(n),\n",
    "     (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens2(n);\n",
    "\n",
    "SELECT add_provenance('matrix');\n",
    "DO $$ BEGIN\n",
    "  PERFORM set_prob(provenance(), prob) FROM matrix;\n",
    "END $$;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now run the same path query with each method in turn, timing each (`\\timing` in psql, or Studio’s per-query timing):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Default method (independent evaluation, tree decomposition, or d4)\n",
    "SELECT m1.x, m2.y,\n",
    "       probability_evaluate(provenance()) AS prob\n",
    "FROM matrix m1, matrix m2\n",
    "WHERE m2.x = m1.y AND m1.x > 90 AND m2.x > 90 AND m2.y > 90\n",
    "GROUP BY m1.x, m2.y\n",
    "ORDER BY m1.x, m2.y;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Exact enumeration over all possible worlds\n",
    "SELECT m1.x, m2.y,\n",
    "       probability_evaluate(provenance(), 'possible-worlds') AS prob\n",
    "FROM matrix m1, matrix m2\n",
    "WHERE m2.x = m1.y AND m1.x > 90 AND m2.x > 90 AND m2.y > 90\n",
    "GROUP BY m1.x, m2.y\n",
    "ORDER BY m1.x, m2.y;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Monte Carlo sampling (9604 samples ≈ 1 % error at 95 % confidence)\n",
    "SELECT m1.x, m2.y,\n",
    "       probability_evaluate(provenance(), 'monte-carlo', '9604') AS prob\n",
    "FROM matrix m1, matrix m2\n",
    "WHERE m2.x = m1.y AND m1.x > 90 AND m2.x > 90 AND m2.y > 90\n",
    "GROUP BY m1.x, m2.y\n",
    "ORDER BY m1.x, m2.y;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Tree-decomposition-based exact compilation (built-in, no external tool)\n",
    "SELECT m1.x, m2.y,\n",
    "       probability_evaluate(provenance(), 'tree-decomposition') AS prob\n",
    "FROM matrix m1, matrix m2\n",
    "WHERE m2.x = m1.y AND m1.x > 90 AND m2.x > 90 AND m2.y > 90\n",
    "GROUP BY m1.x, m2.y\n",
    "ORDER BY m1.x, m2.y;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Knowledge compilation via external tool d4 (guarded with\n",
    "-- tool_available, so it reports rather than fails where d4 is\n",
    "-- absent, e.g. in the Playground)\n",
    "SELECT m1.x, m2.y,\n",
    "       CASE WHEN tool_available('d4')\n",
    "            THEN probability_evaluate(provenance(), 'compilation', 'd4')::text\n",
    "            ELSE '(d4 unavailable here)'\n",
    "       END AS prob\n",
    "FROM matrix m1, matrix m2\n",
    "WHERE m2.x = m1.y AND m1.x > 90 AND m2.x > 90 AND m2.y > 90\n",
    "GROUP BY m1.x, m2.y\n",
    "ORDER BY m1.x, m2.y;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Monte Carlo query uses `9604` samples, which gives roughly 1 % additive error with 95 % confidence (by the formula $n = z^2 / (4\\varepsilon^2)$ with $z = 1.96$, $\\varepsilon = 0.01$; see [Margin of error](https://en.wikipedia.org/wiki/Margin_of_error)).\n",
    "\n",
    "The `'tree-decomposition'` method is exact and built into ProvSQL (no external binary required). It is often the fastest exact method on simple queries, but it fails on circuits with high treewidth – when that happens, fall back to `'compilation'` or one of the other methods.\n",
    "\n",
    "## Step 14: The Boolean Expression Behind a Token\n",
    "\n",
    "[`sr_boolexpr`](https://provsql.org/doxygen-sql/html/group__compiled__semirings.html#gaf221001c4d22d033906fdcaa82864f7a) returns the abstract Boolean formula of a provenance circuit. Without a mapping it uses internal variable names `x0`, `x1`…; with an optional second argument naming a provenance mapping table the leaves are labelled by the mapping’s `value` column instead. This is the same expression ProvSQL hands to its d-DNNF compilers internally to compute probabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT city, sr_boolexpr(provenance()) AS boolexpr\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "WHERE city = 'Nairobi';"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For Nairobi, the result is the circuit `(Juma ⊕ Paul) ⊖ (Juma ⊗ Paul)` from Step 5, interpreted in the Boolean function semiring – every provenance gate is mapped to its Boolean counterpart (`⊕` to `∨`, `⊗` to `∧`, `⊖` to `∧¬`) – and the resulting Boolean function rendered as a formula over anonymous variables. Unlike [`sr_formula`](https://provsql.org/doxygen-sql/html/group__compiled__semirings.html#ga3aad4775805b92307f3ca9e1b9fbd4c5), the provenance mapping is optional: the expression captures the circuit’s logical structure independently of any naming, and a mapping can be supplied later if you want the leaves labelled.\n",
    "\n",
    "## Step 15: Programmatic Circuit Inspection\n",
    "\n",
    "What [`view_circuit`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga9c312a8b72b968e3669432ab5ae49928) renders, you can also walk programmatically with the low-level circuit API. Capture Nairobi’s monus token first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "CREATE TEMP TABLE nairobi_token AS\n",
    "SELECT provenance() AS prov\n",
    "FROM (\n",
    "    SELECT DISTINCT city FROM personnel\n",
    "  EXCEPT\n",
    "    SELECT p1.city\n",
    "    FROM personnel p1\n",
    "    JOIN personnel p2 ON p1.city = p2.city AND p1.id < p2.id\n",
    "    GROUP BY p1.city\n",
    ") t\n",
    "WHERE city = 'Nairobi';"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[`get_nb_gates`](https://provsql.org/doxygen-sql/html/group__gate__manipulation.html#gacc16e47ab21d01de696c6323453d3515) reports how many gates have been materialized in the current database’s circuit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT get_nb_gates();"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[`get_gate_type`](https://provsql.org/doxygen-sql/html/group__gate__manipulation.html#ga1731181eef2acf0b0590edd23b3ccb49) and [`get_children`](https://provsql.org/doxygen-sql/html/group__gate__manipulation.html#gad21ff35d95b5d782bf594eec99c1ee03) give a single-step view of the gate structure: they return the operator and direct children of a gate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT get_gate_type(prov)            AS root_type,\n",
    "       get_children(prov)             AS root_children\n",
    "FROM nairobi_token;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For Nairobi, the root is a `monus` gate with two children: the ⊕ sub-circuit (`Juma ⊕ Paul`) and the ⊗ sub-circuit (`Juma ⊗ Paul`). Recurse to inspect the children:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT (get_children(prov))[1]                        AS plus_token,\n",
    "       get_gate_type((get_children(prov))[1])         AS plus_type,\n",
    "       get_children((get_children(prov))[1])          AS plus_children\n",
    "FROM nairobi_token;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The leaves of the circuit are input gates that originate from the `personnel` table. [`identify_token`](https://provsql.org/doxygen-sql/html/group__circuit__introspection.html#ga5d3eca5ba1f2078973291c96cccae7e5) performs a reverse lookup, returning the table and column count for an input token:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT identify_token(child) AS source\n",
    "FROM nairobi_token, unnest(get_children((get_children(prov))[1])) AS child;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both leaves resolve to `(personnel, 6)` – the `personnel` table with its six non-provenance columns (`id`, `name`, `position`, `city`, `classification`, and the `probability` column added in Step 7). This is exactly the traversal [`view_circuit`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga9c312a8b72b968e3669432ab5ae49928) performs to render the box-art diagram."
   ]
  }
 ]
}