Data-Modification Tracking ========================== Beyond tracking the provenance of *query results*, ProvSQL can also track the provenance of the rows themselves -- recording every ``INSERT``, ``UPDATE``, and ``DELETE`` so that one can later answer "which DML statements is this row's current state due to?", roll back individual operations with :sqlfunc:`undo`, or query the database as it looked at a past point in time. See :doc:`../user/data-modification` and :doc:`../user/temporal` for the user-facing description. This chapter covers how the feature is *implemented*: the GUC, the trigger machinery, the new ``update`` gate type, the ``update_provenance`` table, and how undo / time-travel are wired on top of the same provenance circuits used for query results. Data-modification tracking is **PostgreSQL 14+ only** because the temporal-validity machinery (see *Time-Travel Queries* below) relies on the ``tstzmultirange`` type, which was introduced as part of the multirange family in PostgreSQL 14. The temporal validity of a row is computed by evaluating its provenance circuit over an m-semiring whose carrier is a set of timestamp intervals (``plus`` = union, ``times`` = intersection, ``monus`` = difference); the result of those operations is in general *non-contiguous* -- two disjoint unioned intervals, a difference that punches a hole in the middle of an interval, and so on -- so the carrier cannot be a plain ``tstzrange``. ``tstzmultirange`` is exactly the right type for this semiring, and without it the whole construction does not close. All the SQL pieces live in ``sql/provsql.14.sql`` and are not loaded on older PostgreSQL versions (see :doc:`build-system`). The GUC and the Big Picture --------------------------- The whole subsystem is gated on the ``provsql.update_provenance`` GUC, declared in :cfile:`provsql.c` and registered in :cfunc:`_PG_init`. When it is ``off`` (the default), the ``AFTER`` triggers installed by ``add_provenance`` short-circuit immediately and DML statements behave normally. When it is ``on``, every committed ``INSERT`` / ``UPDATE`` / ``DELETE`` on a provenance-tracked table: 1. Allocates a new ``update`` gate representing the operation itself, with a fresh UUID. 2. Records the operation in the ``update_provenance`` housekeeping table (statement text, user, timestamp, validity range). 3. Combines that gate with the ``provsql`` token of every affected row using either ``provenance_times`` (for inserts) or ``provenance_monus`` (for deletes). An update is modelled as a delete followed by an insert. The end result is that each row's ``provsql`` token is a circuit whose leaves include not only the original ``input`` gates from ``add_provenance`` but also one ``update`` leaf per DML statement that ever touched the row. The ``gate_update`` Gate Type ----------------------------- A new value, ``gate_update``, is added to the :cfunc:`gate_type` enum in :cfile:`provsql_utils.h`. It is treated exactly like ``gate_input`` everywhere it matters: - :cfunc:`MMappedCircuit` and :cfunc:`GenericCircuit` allow probabilities to be attached to it (via :sqlfunc:`set_prob`), because semantically a DML operation is "another input" to the circuit. - :cfunc:`GenericCircuit::evaluate` registers ``update`` gates as members of its ``inputs`` set, so semiring evaluators see them as leaves the same way they see ``input`` gates. - :cfunc:`getBooleanCircuit` translates ``gate_update`` to a Boolean variable like ``gate_input``, so probability evaluation and Shapley computation handle them transparently. The only thing that distinguishes an ``update`` gate from an ``input`` gate is its semantic role -- "this leaf was created by DML, not by the original ``add_provenance``" -- and the fact that the SQL housekeeping records (``update_provenance``) tie its UUID back to a concrete statement. The Triggers ------------ ``add_provenance`` (in ``sql/provsql.14.sql``) installs three ``AFTER`` statement-level triggers on the target table: .. code-block:: sql AFTER INSERT REFERENCING NEW TABLE AS NEW_TABLE EXECUTE PROCEDURE provsql.insert_statement_trigger(); AFTER UPDATE REFERENCING OLD TABLE AS OLD_TABLE NEW TABLE AS NEW_TABLE EXECUTE PROCEDURE provsql.update_statement_trigger(); AFTER DELETE REFERENCING OLD TABLE AS OLD_TABLE EXECUTE PROCEDURE provsql.delete_statement_trigger(); Each trigger: 1. Returns immediately if ``provsql.update_provenance`` is off. 2. Allocates a fresh UUID and creates a corresponding ``update`` gate via :sqlfunc:`create_gate`. 3. Reads the current statement text from ``pg_stat_activity`` and inserts a row into ``update_provenance``. 4. Sets ``provsql.update_provenance = off`` on a *local* basis to suppress the recursive triggering caused by the next steps. 5. Walks the affected rows (``OLD_TABLE`` / ``NEW_TABLE``) and rewrites each row's ``provsql`` token: ``insert`` multiplies the row token by the new operation gate, ``delete`` applies ``provenance_monus`` to remove it, and ``update`` does both (a monus on the old row plus a times on the new one). 6. Re-enables tracking before returning. The temporary disabling in step 4 is essential: rewriting the ``provsql`` column of a tracked table is itself an ``UPDATE``, which would re-trigger the same machinery and recurse forever otherwise. Because deletes need to keep the old row around for time-travel queries (otherwise there would be nothing left to attach the ``monus`` token to), the delete trigger does **not** physically delete the row -- it re-inserts a copy under a fresh ``provsql`` token whose circuit ends in a ``provenance_monus`` against the delete gate. The ``update_provenance`` Table ------------------------------- ``update_provenance`` is a regular PostgreSQL table created by ``sql/provsql.14.sql``. Its schema: .. code-block:: sql CREATE TABLE provsql.update_provenance ( provsql uuid, -- the update gate's UUID query text, -- the SQL text query_type query_type_enum, -- INSERT/UPDATE/DELETE/UNDO username text, -- session_user ts timestamp DEFAULT CURRENT_TIMESTAMP, valid_time tstzmultirange DEFAULT tstzmultirange(tstzrange(CURRENT_TIMESTAMP, NULL)) ); The ``provsql`` column is the *primary key* in spirit: it links each row to the corresponding ``update`` gate in the circuit store. Once you have that token you can hand it back to :sqlfunc:`undo` to roll the operation back, or query it via the temporal functions to find out which rows were affected. The ``valid_time`` column is a ``tstzmultirange``. At the moment an ``update`` gate is created, it is set to the half-infinite interval :math:`[\mathit{ts}, +\infty)`: from the operation's timestamp onward, and not before. Crucially, ``valid_time`` is never modified after this initial insertion -- not even by :sqlfunc:`undo`. Trimming the validity of a superseded operation is *not* a property of the row in this table; it emerges from evaluating the row's provenance circuit over the union-of-intervals m-semiring described in *Time-Travel Queries* below. Undo ---- :sqlfunc:`undo` takes a token (typically read out of ``update_provenance.provsql``) and rolls the corresponding operation back, *without* removing anything from the history. The implementation is pure PL/pgSQL, in ``sql/provsql.14.sql``. The trick is that the row tokens already contain the operation's gate as a leaf; we just need to *cancel* that leaf inside every circuit that references it. ``undo``: 1. Allocates a new ``update`` gate ``u`` and records a brand-new ``UNDO`` row in ``update_provenance`` whose ``valid_time`` is again :math:`[\mathit{now}, +\infty)`. 2. Walks every provenance-tracked table in every schema, and for each row in each table calls ``replace_the_circuit(row_token, op_token, u)`` to rewrite the row's circuit. :sqlfunc:`replace_the_circuit` (also in ``sql/provsql.14.sql``) is a recursive PL/pgSQL function that walks a circuit and rebuilds it with one substitution: every reference to ``op_token`` is replaced by ``provenance_monus(op_token, u)``. Because the circuit is a DAG, the rewrite is structural -- it preserves sharing within a single rewrite call. The undo is *itself* an operation: the new ``update`` gate ``u`` becomes a leaf of every rewritten row, and undoing the undo (yes, you can do that) follows exactly the same path. Time-Travel Queries ------------------- All temporal queries reduce to evaluating a row's provenance circuit over the **union-of-intervals m-semiring**, whose carrier is ``tstzmultirange``: - plus (⊕) is the union of intervals, with neutral element :math:`\emptyset`; - times (⊗) is the pointwise intersection, with neutral element the universal singleton :math:`\{(-\infty, +\infty)\}`; - monus (⊖) is the pointwise difference (A minus B is the union, over :math:`a \in A` and :math:`b \in B`, of :math:`a \cap \overline{b}`). This is exactly the m-semiring the PW'25 paper (:cite:`DBLP:conf/pw/WidiaatmajaDDS25`) describes. It is implemented in ``sql/provsql.14.sql`` by three aggregate functions: - ``union_tstzintervals_plus`` -- ⊕ (union); - ``union_tstzintervals_times`` -- ⊗ (intersection); - ``union_tstzintervals_monus`` -- ⊖ (difference). The interpretation of a gate in this semiring is simple: - an ``input`` gate -- a plain base-table leaf -- has no associated time and is mapped to the ⊗-neutral :math:`\{(-\infty, +\infty)\}`, meaning "valid at any time"; - an ``update`` gate is mapped to its ``valid_time`` from ``update_provenance``, i.e. :math:`[\mathit{ts}, +\infty)`. The mapping is provided by the auto-generated provenance mapping view ``time_validity_view`` over the ``update_provenance.valid_time`` column; :sqlfunc:`get_valid_time` calls ``union_tstzintervals(provenance(), 'time_validity_view')`` on the target row and returns the resulting multirange. The reason the temporal result "just works" after an undo is that the semiring's ⊖ does the trimming arithmetically: after :sqlfunc:`replace_the_circuit` has rewritten ``c`` to ``c ⊖ u``, the union-of-intervals evaluation of that subtree produces :math:`[\mathit{ts}_c, +\infty) \ominus [\mathit{ts}_u, +\infty) = [\mathit{ts}_c, \mathit{ts}_u)` -- exactly the finite interval during which the undone operation was in effect. No record in ``update_provenance`` is ever modified. The remaining user-facing temporal functions are thin wrappers on top of :sqlfunc:`get_valid_time` and ordinary ``tstzmultirange`` operators: - :sqlfunc:`timetravel` -- rows of a table valid at a given timestamp (tests ``@>`` containment); - :sqlfunc:`timeslice` -- rows valid anywhere in a given interval (tests ``&&`` overlap); - :sqlfunc:`history` -- full history of rows matching a key. All four functions live entirely in PL/pgSQL. They contain no C code: every operation reduces to a query against the target table plus a call to :sqlfunc:`provenance_evaluate` with the union-of-intervals semiring.