{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "name": "provsql-studio",
   "display_name": "ProvSQL (SQL)",
   "language": "sql"
  },
  "language_info": {
   "name": "sql"
  },
  "provsql": {
   "scheme": "semiring",
   "database": "cs7",
   "generated_from": "doc/source/user/casestudy7.rst"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Case Study: Peer-Review Assignment and Knowledge Compilation\n",
    "\n",
    "A program chair has a database of uncertain facts about a conference – who bid on what, who is expert in what, which papers are assigned – and keeps asking *probability* questions of it: how likely is it that every paper is competently covered? that this reviewer is conflicted? Each question is a SQL query whose answer carries a probability, and the interesting thing is that the same kind of question can be easy or $\\\\#P$-hard **depending on its exact shape, the schema’s keys, and the data** – and ProvSQL routes each to a different mechanism.\n",
    "\n",
    "This case study walks that landscape over one reviewing dataset, driven through [ProvSQL Studio](https://provsql.org/docs/user/studio.html), organised by *where the tractability comes from*:\n",
    "\n",
    "- **Part A – the query is safe.** Some questions are tractable whatever the data, because the query (with the schema’s keys) is *safe*. Four steps cover the four ways that happens.\n",
    "- **Part B – the query is hard.** When no query-side route applies the answer is genuinely $\\\\#P$-hard, and ProvSQL hands it to a knowledge compiler.\n",
    "- **Part C – the data is well-structured.** A hard *query* can still be easy on *this* data, because ProvSQL compiles along the structure of the data itself.\n",
    "\n",
    "## The data\n",
    "\n",
    "Several relations carry per-tuple probabilities. Three drive the coverage questions:\n",
    "\n",
    "- `bid(reviewer, paper)` – a reviewer offered to review a paper; the probability is how firm the bid is.\n",
    "- `expertise(reviewer, topic)` – the reviewer’s area of competence.\n",
    "- `topic_of(paper, topic)` – the paper is about a topic.\n",
    "\n",
    "The instance has 14 reviewers, 4 topics and 7 papers. **One modelling choice matters throughout:** `expertise` has a **primary key on** `reviewer` – each reviewer has exactly one area (the functional dependency `reviewer` $\\to$ `topic`). Several reviewers share each area on purpose (five do databases), so a paper’s coverage is genuinely entangled: the same `topic_of` tuple is shared by every co-expert who bid on the paper. The remaining relations – `recommend` / `champion`, an external-review pool, `assignment`, and two citation / collaboration graphs – are introduced where first used.\n",
    "\n",
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*The following cells set up the database with all the content this notebook requires; run them first, ideally on a fresh database.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS bid CASCADE;\n",
    "DROP TABLE IF EXISTS expertise CASCADE;\n",
    "DROP TABLE IF EXISTS topic_of CASCADE;\n",
    "DROP TABLE IF EXISTS extends CASCADE;\n",
    "DROP TABLE IF EXISTS coreview CASCADE;\n",
    "DROP TABLE IF EXISTS assignment CASCADE;\n",
    "DROP TABLE IF EXISTS reviewers CASCADE;\n",
    "DROP TABLE IF EXISTS papers CASCADE;\n",
    "DROP TABLE IF EXISTS topics CASCADE;\n",
    "DROP TABLE IF EXISTS lead_chair, urgent_sub, prescreen, score_pass, flag_pass CASCADE;\n",
    "DROP TABLE IF EXISTS bid_label, expertise_label, topic_of_label,\n",
    "                     extends_label, coreview_label,\n",
    "                     lead_chair_label, urgent_sub_label, prescreen_label,\n",
    "                     score_pass_label, flag_pass_label;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Dimension tables (deterministic, no provenance).\n",
    "DROP TABLE IF EXISTS reviewers CASCADE;\n",
    "CREATE TABLE reviewers (id TEXT PRIMARY KEY, name TEXT NOT NULL);\n",
    "INSERT INTO reviewers VALUES\n",
    "  ('r1','Alice'), ('r2','Bob'),   ('r3','Carol'), ('r4','Dave'),\n",
    "  ('r5','Eve'),   ('r6','Frank'), ('r7','Grace'), ('r8','Heidi'),\n",
    "  ('r9','Ivan'),  ('r10','Judy'), ('r11','Karl'), ('r12','Lara'),\n",
    "  ('r13','Mona'), ('r14','Nick');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS papers CASCADE;\n",
    "CREATE TABLE papers (id TEXT PRIMARY KEY, title TEXT NOT NULL);\n",
    "INSERT INTO papers VALUES\n",
    "  ('p1', 'A Provenance Circuit Calculus'),\n",
    "  ('p2', 'Treewidth Bounds for Safe Queries'),\n",
    "  ('p3', 'Sampling Beats Compilation, Sometimes'),\n",
    "  ('p4', 'Functional Dependencies and Read-Once Lineage'),\n",
    "  ('p5', 'A Dichotomy for Conjunctive Coverage'),\n",
    "  ('p6', 'Knowledge Compilation in Practice'),\n",
    "  ('p7', 'Repair-Key Semantics for Assignments');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS topics CASCADE;\n",
    "CREATE TABLE topics (id TEXT PRIMARY KEY, name TEXT NOT NULL);\n",
    "INSERT INTO topics VALUES\n",
    "  ('t1','databases'), ('t2','logic'), ('t3','systems'), ('t4','theory');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- bid(reviewer, paper): the reviewer offered to review the paper; conf\n",
    "-- is how firm the bid is.  Several databases-experts bid on p1, which is\n",
    "-- what makes p1's coverage interesting.\n",
    "DROP TABLE IF EXISTS bid CASCADE;\n",
    "CREATE TABLE bid (\n",
    "  reviewer TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  paper    TEXT NOT NULL REFERENCES papers(id),\n",
    "  conf     DOUBLE PRECISION NOT NULL,\n",
    "  lbl      TEXT,\n",
    "  PRIMARY KEY (reviewer, paper)   -- a reviewer bids on a paper once\n",
    ");\n",
    "INSERT INTO bid(reviewer, paper, conf) VALUES\n",
    "  ('r1','p1',0.5), ('r1','p4',0.35),\n",
    "  ('r2','p1',0.45),('r2','p2',0.4),\n",
    "  ('r5','p2',0.4), ('r5','p4',0.3),\n",
    "  ('r10','p1',0.4),\n",
    "  ('r13','p2',0.35),('r13','p4',0.45),\n",
    "  ('r3','p1',0.5), ('r3','p3',0.35),('r3','p5',0.4),\n",
    "  ('r6','p3',0.4), ('r6','p5',0.35),\n",
    "  ('r9','p7',0.45),('r9','p3',0.3),\n",
    "  ('r4','p5',0.4),\n",
    "  ('r7','p5',0.35),\n",
    "  ('r11','p6',0.45),('r11','p7',0.35),\n",
    "  ('r14','p6',0.4),\n",
    "  ('r8','p4',0.35);\n",
    "UPDATE bid SET lbl = format('bid(%s,%s)', reviewer, paper);\n",
    "SELECT add_provenance('bid');\n",
    "SELECT set_prob(provenance(), conf) FROM bid;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- expertise(reviewer, topic): the reviewer's area of competence.\n",
    "-- KEY ON reviewer (not (reviewer, topic)): one area per reviewer.  This\n",
    "-- functional dependency reviewer -> topic is exactly what makes a single\n",
    "-- paper's coverage query read-once (the safe plan groups on the topic and\n",
    "-- projects the reviewer out); widen the key to (reviewer, topic) and that\n",
    "-- query stops being safe.  Several reviewers share each area on purpose\n",
    "-- (five in databases), so a paper's coverage lineage is genuinely\n",
    "-- entangled (the same topic_of leaf is shared across co-experts).\n",
    "DROP TABLE IF EXISTS expertise CASCADE;\n",
    "CREATE TABLE expertise (\n",
    "  reviewer TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  topic    TEXT NOT NULL REFERENCES topics(id),\n",
    "  conf     DOUBLE PRECISION NOT NULL,\n",
    "  lbl      TEXT,\n",
    "  PRIMARY KEY (reviewer)\n",
    ");\n",
    "INSERT INTO expertise(reviewer, topic, conf) VALUES\n",
    "  ('r1','t1',0.55),('r2','t1',0.5), ('r5','t1',0.5), ('r10','t1',0.45),('r13','t1',0.45),\n",
    "  ('r3','t2',0.55),('r6','t2',0.5), ('r9','t2',0.45),\n",
    "  ('r4','t3',0.5), ('r7','t3',0.5), ('r11','t3',0.45),('r14','t3',0.45),\n",
    "  ('r8','t4',0.55),('r12','t4',0.5);\n",
    "UPDATE expertise SET lbl = format('exp(%s,%s)', reviewer, topic);\n",
    "SELECT add_provenance('expertise');\n",
    "SELECT set_prob(provenance(), conf) FROM expertise;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- topic_of(paper, topic): the paper is about the topic.  Papers overlap\n",
    "-- on topics (p1, p2, p4, p6 all touch databases), which is what couples\n",
    "-- their coverage when the paper variable is left free.\n",
    "DROP TABLE IF EXISTS topic_of CASCADE;\n",
    "CREATE TABLE topic_of (\n",
    "  paper TEXT NOT NULL REFERENCES papers(id),\n",
    "  topic TEXT NOT NULL REFERENCES topics(id),\n",
    "  conf  DOUBLE PRECISION NOT NULL,\n",
    "  lbl   TEXT,\n",
    "  PRIMARY KEY (paper, topic)      -- a paper lists a topic once\n",
    ");\n",
    "INSERT INTO topic_of(paper, topic, conf) VALUES\n",
    "  ('p1','t1',0.6), ('p1','t2',0.55),\n",
    "  ('p2','t1',0.55),('p2','t3',0.5),\n",
    "  ('p3','t2',0.55),('p3','t4',0.45),\n",
    "  ('p4','t1',0.5), ('p4','t4',0.5),\n",
    "  ('p5','t3',0.55),('p5','t2',0.45),\n",
    "  ('p6','t1',0.55),('p6','t3',0.5), ('p6','t4',0.4),\n",
    "  ('p7','t2',0.5), ('p7','t3',0.45);\n",
    "UPDATE topic_of SET lbl = format('topof(%s,%s)', paper, topic);\n",
    "SELECT add_provenance('topic_of');\n",
    "SELECT set_prob(provenance(), conf) FROM topic_of;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- extends(citing, cited): paper `citing` builds on the earlier paper\n",
    "-- `cited`.  ACYCLIC by construction (a paper only extends older ones), so\n",
    "-- the recursive \"what does p transitively build on?\" query is read-once\n",
    "-- per ancestor and works for any semiring (used in the recursive section\n",
    "-- with sr_formula and probability).  Requires PostgreSQL 15+ to query.\n",
    "DROP TABLE IF EXISTS extends CASCADE;\n",
    "CREATE TABLE extends (\n",
    "  citing TEXT NOT NULL REFERENCES papers(id),\n",
    "  cited  TEXT NOT NULL REFERENCES papers(id),\n",
    "  conf   DOUBLE PRECISION NOT NULL,\n",
    "  lbl    TEXT,\n",
    "  PRIMARY KEY (citing, cited)\n",
    ");\n",
    "INSERT INTO extends(citing, cited, conf) VALUES\n",
    "  ('p4','p1',0.8), ('p5','p2',0.7), ('p5','p3',0.6),\n",
    "  ('p6','p4',0.9), ('p6','p5',0.7), ('p7','p3',0.6);\n",
    "UPDATE extends SET lbl = format('ext(%s,%s)', citing, cited);\n",
    "SELECT add_provenance('extends');\n",
    "SELECT set_prob(provenance(), conf) FROM extends;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- coreview(a, b): reviewers a and b have served on a committee together.\n",
    "-- The relation is SYMMETRIC (both directions are stored), so the\n",
    "-- collaboration graph is CYCLIC -- the recursive \"who is reviewer r\n",
    "-- connected to?\" query only terminates under provsql.boolean_provenance,\n",
    "-- where it computes connection *reliability* (a network-reliability /\n",
    "-- #P-hard flavour).  Requires PostgreSQL 15+ to query.\n",
    "DROP TABLE IF EXISTS coreview CASCADE;\n",
    "CREATE TABLE coreview (\n",
    "  a    TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  b    TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  conf DOUBLE PRECISION NOT NULL,\n",
    "  lbl  TEXT,\n",
    "  PRIMARY KEY (a, b)\n",
    ");\n",
    "INSERT INTO coreview(a, b, conf) VALUES\n",
    "  ('r1','r2',0.8),('r2','r1',0.8),\n",
    "  ('r2','r3',0.7),('r3','r2',0.7),\n",
    "  ('r1','r3',0.5),('r3','r1',0.5),\n",
    "  ('r3','r4',0.6),('r4','r3',0.6),\n",
    "  ('r4','r5',0.7),('r5','r4',0.7),\n",
    "  ('r2','r5',0.4),('r5','r2',0.4);\n",
    "UPDATE coreview SET lbl = format('co(%s,%s)', a, b);\n",
    "SELECT add_provenance('coreview');\n",
    "SELECT set_prob(provenance(), conf) FROM coreview;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- assignment(reviewer, paper): the candidate papers a reviewer could be\n",
    "-- assigned to.  repair_key on `reviewer` makes the rows for one reviewer\n",
    "-- mutually exclusive (each reviewer ends up on exactly one paper), so a\n",
    "-- query over this table carries correlated provenance.\n",
    "DROP TABLE IF EXISTS assignment CASCADE;\n",
    "CREATE TABLE assignment (\n",
    "  reviewer TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  paper    TEXT NOT NULL REFERENCES papers(id)\n",
    ");\n",
    "INSERT INTO assignment(reviewer, paper) VALUES\n",
    "  ('r1','p1'), ('r1','p4'),\n",
    "  ('r2','p1'), ('r2','p2'),\n",
    "  ('r3','p1'), ('r3','p3'),\n",
    "  ('r4','p2'), ('r4','p5');\n",
    "SELECT repair_key('assignment', 'reviewer');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Inversion-free demo fixture (Step 7).  A self-join witness that is NOT\n",
    "-- read-once yet is inversion-free, so a single global variable order\n",
    "-- compiles it in linear time while a generic compiler / tree decomposition\n",
    "-- chokes.  Olga (r15) is a prolific bidder who skimmed a 24-paper\n",
    "-- submission batch (q01..q24); `recommend` and `champion` are two\n",
    "-- post-review signals (she recommended a paper for acceptance, and would\n",
    "-- champion it at the PC meeting).  The witness query asks for a reviewer\n",
    "-- whose bids overlap both a recommendation and a championing; grouping on\n",
    "-- the reviewer shares the bid(r15,*) leaves between the two sides, so the\n",
    "-- lineage is not read-once but stays inversion-free (root = reviewer, a\n",
    "-- consistent-unification self-join on `bid`).\n",
    "--\n",
    "-- Kept isolated from the core instance so Steps 1-6 and 8 are unchanged:\n",
    "-- Olga has no `expertise` row and the submission papers carry no\n",
    "-- `topic_of`, so none of this data reaches the coverage / recursive\n",
    "-- queries (which all join through expertise / topic_of / the graphs).\n",
    "DELETE FROM bid WHERE reviewer = 'r15';\n",
    "DELETE FROM papers WHERE id LIKE 'q%';\n",
    "DELETE FROM reviewers WHERE id = 'r15';\n",
    "INSERT INTO reviewers VALUES ('r15','Olga');\n",
    "INSERT INTO papers SELECT 'q'||to_char(g,'FM00'), format('Submission %s', g)\n",
    "  FROM generate_series(1,24) g;\n",
    "INSERT INTO bid(reviewer, paper, conf, lbl)\n",
    "  SELECT 'r15', p.id, 0.5, format('bid(r15,%s)', p.id)\n",
    "  FROM papers p WHERE p.id LIKE 'q%';\n",
    "SELECT set_prob(provenance(), 0.5) FROM bid WHERE reviewer = 'r15';"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS recommend CASCADE;\n",
    "CREATE TABLE recommend (\n",
    "  reviewer TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  paper    TEXT NOT NULL REFERENCES papers(id),\n",
    "  lbl      TEXT,\n",
    "  PRIMARY KEY (reviewer, paper)\n",
    ");\n",
    "INSERT INTO recommend(reviewer, paper, lbl)\n",
    "  SELECT 'r15', p.id, format('rec(r15,%s)', p.id)\n",
    "  FROM papers p WHERE p.id LIKE 'q%';\n",
    "SELECT add_provenance('recommend');\n",
    "SELECT set_prob(provenance(), 0.4) FROM recommend;\n",
    "SELECT create_provenance_mapping('recommend_label', 'recommend', 'lbl');\n",
    "ALTER TABLE recommend DROP COLUMN lbl;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "DROP TABLE IF EXISTS champion CASCADE;\n",
    "CREATE TABLE champion (\n",
    "  reviewer TEXT NOT NULL REFERENCES reviewers(id),\n",
    "  paper    TEXT NOT NULL REFERENCES papers(id),\n",
    "  lbl      TEXT,\n",
    "  PRIMARY KEY (reviewer, paper)\n",
    ");\n",
    "INSERT INTO champion(reviewer, paper, lbl)\n",
    "  SELECT 'r15', p.id, format('champ(r15,%s)', p.id)\n",
    "  FROM papers p WHERE p.id LIKE 'q%';\n",
    "SELECT add_provenance('champion');\n",
    "SELECT set_prob(provenance(), 0.3) FROM champion;\n",
    "SELECT create_provenance_mapping('champion_label', 'champion', 'lbl');\n",
    "ALTER TABLE champion DROP COLUMN lbl;"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Möbius-cancellation demo fixture (Part A, \"safe by cancellation\").  The\n",
    "-- canonical Dalvi-Suciu witness q9 / QW: a UNION of conjunctive queries\n",
    "-- that is SAFE -- PTIME data complexity -- yet only because the #P-hard\n",
    "-- term of its inclusion-exclusion expansion carries a zero Möbius\n",
    "-- coefficient and cancels.  No circuit-level method keeps up (q9 has no\n",
    "-- polynomial-size OBDD / FBDD / d-DNNF), and on dense data its *joint*\n",
    "-- treewidth is unbounded too, so the joint-width route of Part C declines\n",
    "-- and the Möbius route -- which reads the structure off the QUERY, not the\n",
    "-- data -- is the only exact one.  That is why this fixture must be DENSE:\n",
    "-- on a sparse instance the joint width is bounded and Part C's compiler\n",
    "-- would handle it, hiding the route this step exists to show.\n",
    "--\n",
    "-- Dressed as a second-tier external review pool, isolated from the core\n",
    "-- instance (its own c*/e* domain, no foreign keys, so it reaches none of\n",
    "-- the coverage / recursive queries): four area chairs (c1..c4) each run\n",
    "-- three independent assessment passes -- a prescreen, a score and a flag --\n",
    "-- over four embargoed submissions (e1..e4), COMPLETE bipartite so the joint\n",
    "-- width genuinely grows; `lead_chair` marks the senior chairs and\n",
    "-- `urgent_sub` the time-critical submissions.  Mapped onto q9's\n",
    "-- R(x), S1(x,y), S2(x,y), S3(x,y), T(y) as\n",
    "-- lead_chair / prescreen / score_pass / flag_pass / urgent_sub.  The union\n",
    "-- query has no tidy English gloss -- its safety is purely structural -- and\n",
    "-- that is exactly the point.  All tuples carry probability 0.1, low enough\n",
    "-- that the exact answer stays away from saturation.\n",
    "DROP TABLE IF EXISTS lead_chair, urgent_sub, prescreen, score_pass, flag_pass CASCADE;\n",
    "CREATE TABLE lead_chair (chair TEXT NOT NULL, conf DOUBLE PRECISION, lbl TEXT, PRIMARY KEY(chair));\n",
    "CREATE TABLE urgent_sub (sub   TEXT NOT NULL, conf DOUBLE PRECISION, lbl TEXT, PRIMARY KEY(sub));\n",
    "CREATE TABLE prescreen  (chair TEXT, sub TEXT, conf DOUBLE PRECISION, lbl TEXT, PRIMARY KEY(chair, sub));\n",
    "CREATE TABLE score_pass (chair TEXT, sub TEXT, conf DOUBLE PRECISION, lbl TEXT, PRIMARY KEY(chair, sub));\n",
    "CREATE TABLE flag_pass  (chair TEXT, sub TEXT, conf DOUBLE PRECISION, lbl TEXT, PRIMARY KEY(chair, sub));\n",
    "INSERT INTO lead_chair(chair, conf) SELECT 'c'||g, 0.1 FROM generate_series(1,4) g;\n",
    "INSERT INTO urgent_sub(sub,   conf) SELECT 'e'||g, 0.1 FROM generate_series(1,4) g;\n",
    "INSERT INTO prescreen(chair, sub, conf)\n",
    "  SELECT 'c'||i, 'e'||j, 0.1 FROM generate_series(1,4) i, generate_series(1,4) j;\n",
    "INSERT INTO score_pass(chair, sub, conf)\n",
    "  SELECT 'c'||i, 'e'||j, 0.1 FROM generate_series(1,4) i, generate_series(1,4) j;\n",
    "INSERT INTO flag_pass(chair, sub, conf)\n",
    "  SELECT 'c'||i, 'e'||j, 0.1 FROM generate_series(1,4) i, generate_series(1,4) j;\n",
    "UPDATE lead_chair SET lbl = format('lead(%s)', chair);\n",
    "UPDATE urgent_sub SET lbl = format('urgent(%s)', sub);\n",
    "UPDATE prescreen  SET lbl = format('pre(%s,%s)', chair, sub);\n",
    "UPDATE score_pass SET lbl = format('score(%s,%s)', chair, sub);\n",
    "UPDATE flag_pass  SET lbl = format('flag(%s,%s)', chair, sub);\n",
    "SELECT add_provenance('lead_chair'); SELECT add_provenance('urgent_sub');\n",
    "SELECT add_provenance('prescreen');  SELECT add_provenance('score_pass');\n",
    "SELECT add_provenance('flag_pass');\n",
    "SELECT set_prob(provenance(), conf) FROM lead_chair;\n",
    "SELECT set_prob(provenance(), conf) FROM urgent_sub;\n",
    "SELECT set_prob(provenance(), conf) FROM prescreen;\n",
    "SELECT set_prob(provenance(), conf) FROM score_pass;\n",
    "SELECT set_prob(provenance(), conf) FROM flag_pass;\n",
    "SELECT create_provenance_mapping('lead_chair_label', 'lead_chair', 'lbl');\n",
    "SELECT create_provenance_mapping('urgent_sub_label', 'urgent_sub', 'lbl');\n",
    "SELECT create_provenance_mapping('prescreen_label',  'prescreen',  'lbl');\n",
    "SELECT create_provenance_mapping('score_pass_label', 'score_pass', 'lbl');\n",
    "SELECT create_provenance_mapping('flag_pass_label',  'flag_pass',  'lbl');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- Label mappings so the Studio eval-strip's sr_formula / sr_why / sr_how\n",
    "-- and PROV-XML export name the leaves instead of showing raw UUIDs. The\n",
    "-- `lbl` columns exist only to feed create_provenance_mapping, which copies\n",
    "-- their values into standalone (value, provenance) tables.\n",
    "SELECT create_provenance_mapping('bid_label',      'bid',      'lbl');\n",
    "SELECT create_provenance_mapping('expertise_label','expertise','lbl');\n",
    "SELECT create_provenance_mapping('topic_of_label', 'topic_of', 'lbl');\n",
    "SELECT create_provenance_mapping('extends_label',  'extends',  'lbl');\n",
    "SELECT create_provenance_mapping('coreview_label', 'coreview', 'lbl');"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- A single `label` mapping: the union of every per-tuple label above, so a\n",
    "-- text-based semiring (sr_formula / sr_why / sr_how) can name the leaves of\n",
    "-- a query spanning several relations through one mapping argument, rather\n",
    "-- than picking the relation-specific mapping each time.\n",
    "DROP TABLE IF EXISTS label CASCADE;\n",
    "CREATE TABLE label AS\n",
    "            SELECT value, provenance FROM bid_label\n",
    "  UNION ALL SELECT value, provenance FROM expertise_label\n",
    "  UNION ALL SELECT value, provenance FROM topic_of_label\n",
    "  UNION ALL SELECT value, provenance FROM extends_label\n",
    "  UNION ALL SELECT value, provenance FROM coreview_label\n",
    "  UNION ALL SELECT value, provenance FROM recommend_label\n",
    "  UNION ALL SELECT value, provenance FROM champion_label\n",
    "  UNION ALL SELECT value, provenance FROM lead_chair_label\n",
    "  UNION ALL SELECT value, provenance FROM urgent_sub_label\n",
    "  UNION ALL SELECT value, provenance FROM prescreen_label\n",
    "  UNION ALL SELECT value, provenance FROM score_pass_label\n",
    "  UNION ALL SELECT value, provenance FROM flag_pass_label;\n",
    "CREATE INDEX ON label(provenance);"
   ],
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "-- `conf` only fed set_prob and `lbl` only fed create_provenance_mapping;\n",
    "-- both jobs are now done, so drop the artifact columns from the base tables.\n",
    "ALTER TABLE bid       DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE expertise DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE topic_of  DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE extends   DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE coreview  DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE lead_chair DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE urgent_sub DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE prescreen  DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE score_pass DROP COLUMN conf, DROP COLUMN lbl;\n",
    "ALTER TABLE flag_pass  DROP COLUMN conf, DROP COLUMN lbl;"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The script seeds each tuple’s probability and tags every uncertain relation in Studio’s schema panel with a `prov-tid` pill (tuple-independent), or `prov-bid` for the block-correlated `assignment`. The provenance class is the `Boolean` / `Absorptive` / `Semiring` / `Where` toggle (see `studio-query-toggles`): most steps want `Boolean`, where ProvSQL’s safe-query rewriter and data compilers are active; `Semiring` turns them off and shows the literal circuit; `Absorptive` is needed only for the cyclic recursion at the very end.\n",
    "\n",
    "## Part A: The Query Is Safe\n",
    "\n",
    "A *safe* query is one whose exact probability is PTIME in the data – whatever the data looks like. ProvSQL recognises safety at planning time and answers without any compiler. There are exactly four ways a (self-join-free) query can be safe, and the four steps below are one of each.\n",
    "\n",
    "### Safe by shape\n",
    "\n",
    "*We need a coverage shortlist: which papers have at least one plausibly qualified reviewer – someone who bid on the paper and has some area of expertise?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT p.id, p.title\n",
    "FROM bid b, expertise e, papers p\n",
    "WHERE b.reviewer = e.reviewer AND b.paper = p.id\n",
    "GROUP BY p.id, p.title\n",
    "ORDER BY p.id"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Click into `p1`’s `provsql` cell and, in the eval strip, pick `Marginal probability` with the `independent` method. It returns `≈ 0.666` instantly – and *Compiled d-D circuit* with *interpret as d-D*, or *Tree decomposition*, all agree.\n",
    "\n",
    "Why it works: the query is **hierarchical** – the atoms mentioning `topic` (just `expertise`) sit inside those mentioning `reviewer` (`bid` and `expertise`) – so a paper’s coverage is an `OR` over reviewers of independent terms, with no tuple shared. The circuit is **read-once**, and `independent` is exact on read-once circuits. This is the easiest corner: safe by the query’s shape alone, no key needed.\n",
    "\n",
    "### Safe by a key\n",
    "\n",
    "*Now the question that matters for assignment: is paper* `p1` *competently covered – did someone bid on it who is expert in one of* `p1`\\*’s own topics?\\*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT DISTINCT 1\n",
    "FROM bid b, expertise e, topic_of t\n",
    "WHERE b.reviewer = e.reviewer\n",
    "  AND e.topic    = t.topic\n",
    "  AND b.paper = 'p1' AND t.paper = 'p1'"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try `independent` with the toggle on `Semiring`: it **errors** – *“Not an independent circuit”*. Switch the toggle to `Boolean` and try again: it succeeds, `≈ 0.4259`, matching *Tree decomposition* exactly.\n",
    "\n",
    "The difference is the **key**. This query is non-hierarchical – `bid` mentions only `reviewer`, `topic_of` only `topic`, `expertise` both – so its literal lineage reuses the shared `topic_of(p1, t1)` tuple (Alice, Bob and Judy are all database experts who bid on `p1`) and is *not* read-once. But because `expertise` is keyed on `reviewer`, each reviewer has a single topic, so the safe-query rewriter can group the experts by topic and factor that shared tuple out:\n",
    "\n",
    "$$\n",
    "\\bigvee_{t} \\\\; \\mathit{topic\\\\_of}(p_1, t) \\\\;\\wedge\\\\;\n",
    "  \\Bigl(\\bigvee_{r:\\\\,\\mathit{exp}(r,t)} \\mathit{bid}(r, p_1) \\wedge\n",
    "    \\mathit{exp}(r, t)\\Bigr).\n",
    "$$\n",
    "\n",
    "Each leaf now appears once – read-once again, and `independent` is exact. The lesson: **safety depends on the query and the keys together**. Drop the key (`ALTER TABLE expertise DROP CONSTRAINT expertise_pkey`) and `independent` still returns `0.4259`, but the route changes – the query is now genuinely hard and falls through to Part C’s data compiler. Add it back before continuing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "ALTER TABLE expertise ADD PRIMARY KEY (reviewer);"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Safe by a query-derived order\n",
    "\n",
    "For this step the fixture adds **Olga** (`r15`), a prolific reviewer who skimmed a 24-paper batch, with two post-review signals: `recommend` (she recommended a paper) and `champion` (she would champion it at the meeting).\n",
    "\n",
    "*Which reviewers have bids that back up both a recommendation and a championing – a sign they engaged deeply?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT r.id, r.name\n",
    "FROM bid b1, recommend a, bid b2, champion c, reviewers r\n",
    "WHERE b1.reviewer = a.reviewer AND b1.paper = a.paper\n",
    "  AND b1.reviewer = b2.reviewer\n",
    "  AND b2.reviewer = c.reviewer AND b2.paper = c.paper\n",
    "  AND b1.reviewer = r.id\n",
    "GROUP BY r.id, r.name"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On Olga’s row, try the heavy methods first: `tree-decomposition` gives up (*“Treewidth greater than 10”*), `possible-worlds` refuses (over 64 inputs), and a real compiler (`d4`) is cut off by `statement_timeout` on the 24-paper instance. Then try `inversion-free` – or just the default method – and it returns `0.975314` in milliseconds.\n",
    "\n",
    "Why the gap: grouping on the reviewer makes the two evidence sides share the `bid(r15, *)` tuples, so the lineage is **not** read-once and every circuit-level method treats it as hard. But the query is a *consistent-unification self-join* with a single root variable, which is the **inversion-free** condition: it admits a linear-size OBDD over a variable order *read from the query*. ProvSQL finds that order at planning time (the teal `IF` badge on the root is the certificate) and builds a deep, chain-like d-DNNF – tractability that is invisible in the materialised circuit and visible only in the query.\n",
    "\n",
    "### Safe by cancellation (Möbius inversion)\n",
    "\n",
    "The fourth corner is the subtlest, and it needs its own little world: an **external-review pool**, isolated from the main instance, where four area chairs (`c1`–`c4`) each run three independent assessment passes – a *prescreen*, a *score* and a *flag* – over four embargoed submissions (`e1`–`e4`), with `lead_chair` marking senior chairs and `urgent_sub` the time-critical submissions.\n",
    "\n",
    "*Is the pool in a “well-attended” state?* – a union of four overlapping patterns of who-assessed-what. This is the textbook query $q_9$ / $Q_W$ (Dalvi & Suciu); its four patterns have no tidy English gloss, because its tractability is *purely structural* – which is exactly the point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT 1 FROM lead_chair r, prescreen a1, flag_pass a3, urgent_sub t3\n",
    "  WHERE r.chair = a1.chair AND a3.sub = t3.sub\n",
    "UNION\n",
    "SELECT 1 FROM prescreen b1, score_pass b2, flag_pass b3, urgent_sub tb\n",
    "  WHERE b1.chair = b2.chair AND b1.sub = b2.sub AND b3.sub = tb.sub\n",
    "UNION\n",
    "SELECT 1 FROM score_pass c2, flag_pass c3, flag_pass c3b, urgent_sub tc\n",
    "  WHERE c2.chair = c3.chair AND c2.sub = c3.sub AND c3b.sub = tc.sub\n",
    "UNION\n",
    "SELECT 1 FROM lead_chair d, prescreen d1, prescreen d1b,\n",
    "              score_pass d2, score_pass d2b, flag_pass d3\n",
    "  WHERE d.chair = d1.chair AND d1b.chair = d2.chair AND d1b.sub = d2.sub\n",
    "    AND d2b.chair = d3.chair AND d2b.sub = d3.sub"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try `Marginal probability` with the toggle on `Boolean`: it returns the exact `0.056923` with no method named (the chooser routes it through the Möbius compiler automatically; once the μ root is rendered you can also pick the `mobius` method explicitly). Try any circuit compiler instead and it blows up: $q_9$ provably has no polynomial OBDD / FBDD / decision-DNNF.\n",
    "\n",
    "Why it is nonetheless safe: writing the probability by inclusion-exclusion, the one $\\\\#P$-hard term – the conjunction of all four patterns – gets a **zero Möbius coefficient** and cancels, leaving only easy terms. ProvSQL’s Möbius compiler computes exactly that signed combination. Click the existence row’s `provsql` cell: the circuit is large – the **μ** (Möbius-function) root carries the whole literal lineage as a transparent child, so Studio shows a *Circuit too large* card – choose `Render at depth 1` and the root is that single **μ** gate, each child edge labelled with its integer coefficient, the hard term among them cancelled to zero. (The pool is **dense** on purpose: on sparse data Part C’s compiler would also handle it and hide the point. Like every Part A route, the gate keeps the literal lineage, so `shapley` and `sr_formula` still work on it.)\n",
    "\n",
    "### The four routes at a glance\n",
    "\n",
    "These four steps are the complete set of exact query-side routes – the Dalvi-Suciu dichotomy made operational:\n",
    "\n",
    "| The query is safe because… | ProvSQL route | Witness in this Part |\n",
    "|----|----|----|\n",
    "| its **shape** is hierarchical | read-once lineage → `independent` (no rewrite) | coverage, all papers |\n",
    "| a **key** makes it read-once | safe-query rewrite (FD) | `p1` competent coverage |\n",
    "| a query-derived **order** | inversion-free certificate | Olga’s bid self-join |\n",
    "| its $\\\\#P$-hard term **cancels** | Möbius compiler | the review pool ($q_9$) |\n",
    "\n",
    "All four are PTIME, need no external tool, and assume tuple-independent inputs. When none applies, the query is genuinely hard (Part B) – unless the data rescues it (Part C). See the full tractability table.\n",
    "\n",
    "## Part B: The Query Is Hard\n",
    "\n",
    "Ask the competent-coverage question of the **whole program** instead of one paper and every Part A route fails. This Part follows that hard query through knowledge compilation.\n",
    "\n",
    "### The hard query, and what a compiler does with it\n",
    "\n",
    "*Is* **any** *paper competently covered?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT DISTINCT 1\n",
    "FROM bid b, expertise e, topic_of t\n",
    "WHERE b.reviewer = e.reviewer\n",
    "  AND e.topic    = t.topic\n",
    "  AND t.paper    = b.paper"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set the toggle to `Semiring` so no Boolean shortcut fires, and the provenance is the literal circuit of the cyclic join – Studio draws it, visibly bushy. Try `independent`: it **errors**. Try `tree-decomposition` or a compiler: `≈ 0.8818`.\n",
    "\n",
    "Why it is hard: with `paper` free, `reviewer`, `paper` and `topic` form a cycle with no nesting – non-hierarchical, not inversion-free, and a single conjunct so nothing cancels. All of Part A is exhausted, and the probability is $\\\\#P$-hard. The circuit’s treewidth is **4** (against **1** for a safe query), so a real compiler (*Compiled d-D circuit* with `d4`) turns it into a d-DNNF of order a thousand nodes – and the number `0.8818`.\n",
    "\n",
    "That compiled circuit is the object the rest of Part B inspects.\n",
    "\n",
    "### Reading the CNF back against the data\n",
    "\n",
    "Pick *Tseytin CNF*: the panel shows the DIMACS CNF ProvSQL streams to an external compiler, with one `c input` comment per variable. Studio annotates each with the source tuple it stands for, so a model or weighted count returned by an outside tool reads back against the reviewing data (the same mapping is a table through [`tseytin_cnf_mapping`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#ga22a56449dce072ad0bdabbb526096cd6)).\n",
    "\n",
    "### Comparing compilers and methods\n",
    "\n",
    "Pick *Probability benchmark*: it times every probability method on the hard circuit, one row each, with the compiled d-DNNF size beside the run time. Observe the spread: the exact methods that finish (`tree-decomposition`, the compilers, the model counters) all agree to full precision; `monte-carlo` lands in its confidence band; `independent` shows its error; and methods that do not scale – `possible-worlds` enumeration – hit the `statement_timeout`. [`ddnnf_stats`](https://provsql.org/doxygen-sql/html/group__provenance__output.html#gaf04888a5ff2ef7668754b5a4a2c1bf3a) exposes the same sizes as `jsonb` for comparing one circuit across compilers.\n",
    "\n",
    "Which compilers appear depends on what is installed; the **Tools** panel (the wrench icon) lists the registry and flags each as available / enabled, read live from `provsql.tools` (see [the external-tool registry](https://provsql.org/docs/user/tool-registry.html)).\n",
    "\n",
    "### A shortcut that skips the compiler\n",
    "\n",
    "Not every hard-looking query reaches a compiler. *Which papers have at least two bidding experts?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT p.id, p.title\n",
    "FROM bid b, expertise e, papers p\n",
    "WHERE b.reviewer = e.reviewer AND b.paper = p.id\n",
    "GROUP BY p.id, p.title\n",
    "HAVING count(*) >= 2\n",
    "ORDER BY p.id"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute the probability for `p1`: it comes back at once, and ProvSQL emits a NOTICE that the `count(*) >= 2` comparison gate was *shortcut*. The `HAVING` threshold over independent contributors is a [Poisson-binomial](https://en.wikipedia.org/wiki/Poisson_binomial_distribution) distribution, which [`probability_evaluate`](https://provsql.org/doxygen-sql/html/group__probability.html#gad377e94cb1fff57141b1950cc4169c5e) folds in closed form – replacing the whole provenance with one Bernoulli gate before any compiler runs, so even `independent` answers it.\n",
    "\n",
    "## Part C: The Data Is Well-Structured\n",
    "\n",
    "Part B’s whole-program coverage is $\\\\#P$-hard in its *shape* – yet on *this* data it is tractable. When no query-side route applies, ProvSQL still avoids a compiler if the **data** is well-structured: it compiles a certified circuit along a tree decomposition of the data itself, exactly and in linear time – over independent data, correlated data, and through recursion.\n",
    "\n",
    "### The same hard query, now easy\n",
    "\n",
    "Set the toggle back to `Boolean` and re-run Part B’s hard query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT DISTINCT 1\n",
    "FROM bid b, expertise e, topic_of t\n",
    "WHERE b.reviewer = e.reviewer AND e.topic = t.topic AND t.paper = b.paper"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now `independent` **succeeds**, the same `≈ 0.8818` – and no compiler ran. The query is still $\\\\#P$-hard in shape, but on this data the **joint treewidth** (the data graph together with its correlations) is bounded, so ProvSQL’s **joint-width compiler** Amarilli2016thesis recognises the shape and compiles the provenance along a tree decomposition of the *data* into a certified d-D that `independent` reads in linear time. This is the route the dropped-key query of Part A fell through to.\n",
    "\n",
    "Read it per paper – *for each paper, how competently is it covered?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT t.paper\n",
    "FROM bid b, expertise e, topic_of t\n",
    "WHERE b.reviewer = e.reviewer AND e.topic = t.topic AND t.paper = b.paper\n",
    "GROUP BY t.paper ORDER BY t.paper"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Marginal probability` gives each group’s value – `p1` `0.425869`, `p4` `0.300776` – matching a compiler to full precision, from the data’s structure with no external tool.\n",
    "\n",
    "### Correlated inputs\n",
    "\n",
    "So far every relation was tuple-independent. The `assignment` table lists, per reviewer, the papers they *could* be assigned to, made mutually exclusive by `repair_key` on `reviewer` (each reviewer ends up on one paper). *Which papers get an assigned reviewer?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT p.id, p.title\n",
    "FROM assignment a JOIN papers p ON a.paper = p.id\n",
    "GROUP BY p.id, p.title\n",
    "ORDER BY p.id"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try `independent` – it **agrees** with the exact methods (`p1` `0.875`, `p2` `0.75`). The circuit encodes the mutual exclusion as `mulinput` gates sharing a block key, and `independent`’s evaluator gives mutually exclusive siblings special treatment (it *sums* their probabilities within a block instead of multiplying). So this kind of block correlation, unlike the cycle of Part B, stays tractable with no compiler at all – because the *query* here is safe; only the *inputs* are correlated.\n",
    "\n",
    "### Hard *and* correlated\n",
    "\n",
    "The joint-width route earns its keep where the two regimes meet. *Is any paper covered by its assigned expert reviewer?* – Part B’s hard cyclic shape, now over the correlated `assignment` table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "SELECT DISTINCT 1\n",
    "FROM assignment a, expertise e, topic_of t\n",
    "WHERE a.reviewer = e.reviewer AND e.topic = t.topic AND t.paper = a.paper"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try `independent` with the toggle on `Semiring` (the literal circuit): it **rejects** it (a reviewer’s candidate papers are mutually exclusive, so the lineage is neither independent nor read-once), and Part A’s routes do not apply to the cyclic shape. Switch the toggle back to `Boolean` and the joint-width route compiles it anyway: the joint treewidth – data graph *plus* the `repair_key` exclusion blocks – is bounded, so ProvSQL builds a certified d-D (each block stick-broken into shared independent events) that `independent` evaluates to the exact `0.735868`. This is the one cell of the tractability table nothing else fills: $\\\\#P$-hard *and* correlated, exact and linear in the data.\n",
    "\n",
    "### Recursion: reachability and reliability\n",
    "\n",
    "**Note:**\n",
    "\n",
    "> Recursive queries (`WITH RECURSIVE`) require **PostgreSQL ≥ 15**.\n",
    "\n",
    "ProvSQL tracks provenance through `WITH RECURSIVE` too: a reachability answer’s provenance is the disjunction over the paths that reach it. Two more relations exercise the two regimes: `extends(citing, cited)` is an **acyclic** citation graph, `coreview(a, b)` a **symmetric** (hence cyclic) collaboration graph.\n",
    "\n",
    "*What does paper* `p6` *transitively build on?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {}
   },
   "source": [
    "WITH RECURSIVE anc(paper) AS (\n",
    "    SELECT 'p6'\n",
    "  UNION\n",
    "    SELECT e.cited FROM extends e JOIN anc a ON e.citing = a.paper\n",
    ")\n",
    "SELECT p.id, p.title\n",
    "FROM anc JOIN papers p ON anc.paper = p.id\n",
    "WHERE anc.paper <> 'p6' ORDER BY p.id"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`sr_formula` shows each ancestor’s lineage as the conjunction along the path (`p1` is `ext(p4,p1) ⊗ ext(p6,p4)`) and `Marginal\n",
    "probability` gives its value (`p1` `0.72`). Because the graph is acyclic the lineage is read-once per ancestor, so *every* semiring evaluation works, not just probability.\n",
    "\n",
    "*Who is reviewer* `r1` *connected to through co-reviewing?* – now a cyclic walk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "provsql": {
     "scheme": "absorptive"
    }
   },
   "source": [
    "WITH RECURSIVE conn(node) AS (\n",
    "    SELECT 'r1'\n",
    "  UNION\n",
    "    SELECT e.b FROM coreview e JOIN conn c ON e.a = c.node\n",
    ")\n",
    "SELECT r.id, r.name\n",
    "FROM conn JOIN reviewers r ON conn.node = r.id\n",
    "WHERE conn.node <> 'r1' ORDER BY r.id"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the toggle on `Semiring` (the default) the fixpoint never stabilises – *“no fixpoint after 1000 rounds (cyclic data?)”* – because a cycle keeps producing new derivations. Switch the toggle to `Absorptive` and it converges: $1 \\oplus a = 1$, so a longer cycle-revisiting path is absorbed by the shorter one inside it, and the fixpoint is the set of minimal paths.\n",
    "\n",
    "The probability is then **two-terminal network reliability** – that `r1` stays connected when each edge is present independently – which is $\\\\#P$-hard in general. Yet `Marginal probability` reads `r1` reaches `r5` with reliability `0.5496` straight off, and even `independent` evaluates it. Why: ProvSQL recognises the reachability shape and, by the provenance form of Courcelle’s theorem, compiles along a tree decomposition of the **collaboration graph itself** into a certified d-D, one per reachable vertex, linear in the edges when the graph has bounded treewidth (see `network-reliability-btw`). That is the case study in miniature: Part B’s hardness lived in the *query* and needed a compiler; here – as throughout Part C – it is dissolved by the structure of the *data*, with no external tool. (The `'absorptive'` marker on these tokens makes multiplicity-counting or why-provenance – genuinely infinite on cycles – refuse rather than return an unjustified value.)\n",
    "\n",
    "**See also:**\n",
    "\n",
    "> - [The knowledge-compilation chapter](https://provsql.org/docs/user/knowledge-compilation.html) for the full pipeline and every function used here.\n",
    "> - [The chapter on probabilities](https://provsql.org/docs/user/probabilities.html) for the probability methods and the `tractability table\n",
    ">   <tractable-cases>` that organises this case study.\n",
    "> - The *Probabilistic Databases* synthesis lecture for the theory of safe queries and the dichotomy."
   ]
  }
 ]
}