# Changelog

All notable changes to [ProvSQL](https://provsql.org/) are documented
in this file.  It mirrors the release-notes section of the website
([provsql.org/releases](https://provsql.org/releases/)) and is kept in
sync by the `release.sh` release-automation script.

## [1.3.0] - 2026-05-04

### Breaking change: per-database circuit storage

Prior to 1.3.0, the provenance circuit was stored in four flat files at
the root of the PostgreSQL data directory (`$PGDATA/provsql_gates.mmap`,
`provsql_wires.mmap`, `provsql_mapping.mmap`, `provsql_extra.mmap`),
shared across all databases in the cluster. Starting with 1.3.0, each
database gets its own isolated set of files under
`$PGDATA/base/<db_oid>/`.

**Users upgrading from 1.2.x must migrate their circuit data before
upgrading.** The new `provsql_migrate_mmap` tool handles this. If the
migration is skipped, existing circuit data becomes inaccessible (new
provenance queries still work, but provenance computed under the old
version is lost). The upgrade script detects old flat files and raises a
WARNING with recovery instructions if they are still present.

#### Upgrade procedure

1. Install the new ProvSQL binaries:
   ```
   make install
   ```

2. Run the migration tool as the `postgres` user:
   ```
   provsql_migrate_mmap -D $PGDATA -c <connstr>
   ```
   The tool reads the old flat files, collects root UUIDs from each
   database's provenance-tracked tables, writes per-database files under
   `$PGDATA/base/<db_oid>/`, and deletes the old flat files on success.

3. Restart PostgreSQL.

4. In each database that uses ProvSQL:
   ```sql
   ALTER EXTENSION provsql UPDATE;
   ```

#### If you forgot step 2

If PostgreSQL has already been restarted with the new binaries before
migrating, some empty per-database files may have been created. To
recover:

1. Delete the empty per-database files:
   ```
   rm -f $PGDATA/base/*/provsql_*.mmap
   ```
2. Restart PostgreSQL.
3. Immediately run `provsql_migrate_mmap` before executing any
   provenance query.

### Lazy input gate creation

`add_provenance()` no longer eagerly writes an input gate to the circuit
for every existing row in the table at the time it is called. Gates are
now created on first reference during a query, at the cost of a small
overhead on the first query that touches each row. This significantly
reduces the overhead of provisioning large tables.

### Four case studies

Four worked examples have been added to the documentation and are
included as regression tests:

- **Case Study 1: The Intelligence Agency**: simple introductory
  example with Boolean and why-provenance.
- **Case Study 2: The Open Science Database**: comprehensive example
  covering why-provenance, where-provenance, custom semirings,
  probabilities, Shapley and Banzhaf values.
- **Case Study 3: Île-de-France Public Transit**: Boolean provenance
  and formula inspection over GTFS transit data.
- **Case Study 4: Government Ministers Over Time**: temporal provenance
  with `union_tstzintervals` and time-validity views.

### Bug fixes

- Fix GROUP BY provenance aggregation silently dropped when ORDER BY
  referenced the semiring result column.
- Fix d-DNNF tree decomposition: deduplicate OR gate children to prevent
  double-counting in probability evaluation.
- Fix NULL dereference and out-of-bounds crashes in where-provenance on
  views.
- Fix temporal functions (`time_filter`, `time_range`, `in_interval`) to
  use schema-qualified `provsql.time_validity_view`, preventing failures
  when `search_path` does not include the `provsql` schema.
- Fix `sr_boolean` evaluation when the provenance mapping uses integer
  values.
- Fix where-provenance PROJECT gate positions for provenance tables that
  are not the first RTE in a query, causing empty locator sets on some
  PostgreSQL versions.

## [1.2.3] - 2026-04-12

### PGXN improvements

- Prevent indexing of secondary documentation directories
  (`doc/source/`, `doc/tutorial/`, `doc/demo/`, `doc/aggregation/`,
  `doc/temporal_demo/`, `where_panel/`) on the PGXN distribution page
  via `no_index` in `META.json`.

- Document PGXN as an installation channel in the user guide, with
  a note that `pgxn install` does not configure
  `shared_preload_libraries`.

- Add a GitHub Actions workflow (`.github/workflows/pgxn.yml`) that
  automatically publishes releases to PGXN on version-tag pushes.

### Documentation and repository housekeeping

- Add `CODE_OF_CONDUCT.md`.

- Add architecture dataflow diagram to the website overview page.

- Replace `sudo` with generic "as a user with write access to the
  PostgreSQL directories" wording across installation and contribution
  documentation.

## [1.2.2] - 2026-04-11

### In-place extension upgrades

`ALTER EXTENSION provsql UPDATE` is now supported, starting with this
release.  A committed chain of upgrade scripts under `sql/upgrades/`
covers every previous release (1.0.0 → 1.1.0 → 1.2.0 → 1.2.1 → 1.2.2),
so users on any historical version can upgrade in place without
dropping and recreating the extension.  The persistent provenance
circuit (memory-mapped files) is preserved across the upgrade: the
on-disk format has been binary-stable since 1.0.0, and the relevant
headers (`src/MMappedCircuit.h`, `src/provsql_utils.h`) now carry
explicit warnings so future contributors don't break that guarantee
by accident.

A pg_regress regression test (`test/sql/extension_upgrade.sql`)
exercises the full chain end-to-end on every PostgreSQL version in
the CI matrix, installing the extension at 1.0.0 from a frozen
install-script fixture and walking it up to the current
`default_version`.  See the new "Extension Upgrades" section of the
developer guide for the workflow contributors should follow when
making SQL changes.

### Repository housekeeping and discoverability

- **`CHANGELOG.md`** at the repository root, mirroring the release
  notes published at
  [provsql.org/releases](https://provsql.org/releases/).  It is
  automatically kept in sync by `release.sh`.

- **GitHub issue and pull-request templates** under `.github/`.
  The bug-report form prompts for PostgreSQL version, ProvSQL
  version, OS, a minimal SQL reproducer, and optional verbose-mode
  output; the PR template carries a contributor checklist and links
  to the developer guide.

- **DockerHub** image-version badge added to the README
  ([`inriavalda/provsql`](https://hub.docker.com/r/inriavalda/provsql))
  and a prose pointer on the website overview page.

- **PGXN `META.json`** at the repository root, making ProvSQL ready
  for submission to the [PostgreSQL Extension Network](https://pgxn.org).
  Submission will happen once upstream approval lands; no change to
  the build or install flow in the meantime.

- **`CITATION.cff`** now carries the Zenodo concept DOI
  ([`10.5281/zenodo.19512786`](https://doi.org/10.5281/zenodo.19512786))
  and a Software Heritage archive URL in its `identifiers` block.

### Infrastructure

- `release.sh` learned to update `CITATION.cff`, `CHANGELOG.md`, and
  `META.json` in sync with `provsql.common.control` and
  `website/_data/releases.yml`, and to enforce the presence of an
  upgrade script (auto-generating a no-op when no SQL sources have
  changed since the previous tag).
- CI workflows now fetch git tags so `git describe` works inside
  the pgxn-tools containers, which unblocks the Makefile's
  dev-cycle upgrade-script generation.
- The four build / docs workflows' `paths-ignore` lists exclude
  `META.json`, `.github/ISSUE_TEMPLATE/**`, and
  `.github/pull_request_template.md`, so metadata-only edits do not
  trigger the full CI matrix any more.

### No SQL-level changes

There are **no changes to `sql/provsql.common.sql` or
`sql/provsql.14.sql`** in this release.  The SQL API, query
rewriter, semiring evaluators, and probability machinery are
unchanged from 1.2.1.  The upgrade script `1.2.1 → 1.2.2` is
accordingly an empty placeholder.

## [1.2.1] - 2026-04-11

Maintenance release headlining the new **developer guide** and laying
the groundwork for long-term archival and citation.

### Highlights

- **Developer guide** (14 chapters, ~3500 lines): PostgreSQL extension
  primer, architecture, query rewriting pipeline, memory management,
  where-provenance, data-modification tracking, aggregation semantics,
  semiring and probability evaluation (including the block-independent
  database model and the expected Shapley / Banzhaf algorithm from
  Karmakar et al., PODS 2024), coding conventions, testing, debugging,
  and the build system.  Cross-references to Lean 4 machine-checked
  proofs of the positive-fragment rewriting rules and the m-semiring
  axioms.  See the new Developer Guide tab in the documentation.

- **User guide updates**: expanded coverage of `expected()`, the
  `choose` aggregate, custom semiring evaluation, and diagnostic
  functions.  "Formula semiring" has been renamed to "symbolic
  representation" throughout.

- **`CITATION.cff`**: standard citation metadata at the repo root.
  GitHub now shows a **Cite this repository** button that emits
  BibTeX and APA for the ICDE 2026 paper.

- **Software Heritage** archival is active — the full repository
  history is continuously preserved at archive.softwareheritage.org.

- **Zenodo integration** enabled: starting with this release, every
  tagged version receives a persistent DOI
  ([10.5281/zenodo.19512786](https://doi.org/10.5281/zenodo.19512786)).

### Fixes

- `create_provenance_mapping_view` is now available on all supported
  PostgreSQL versions, not only PG 14+.

- External-tool tests (`c2d`, `d4`, `dsharp`, `minic2d`, `weightmc`,
  `view_circuit_multiple`) now skip cleanly when the tool is not
  installed, instead of being removed from the test schedule.

### Infrastructure

- Automated documentation coherence check runs in CI (validates every
  `:sqlfunc:` / `:cfunc:` / `:cfile:` / `:sqlfile:` reference resolves
  to a live Doxygen anchor).
- Mobile-friendly Doxygen and Sphinx output.
- CI speedups: concurrency groups, skip-on-tags, macOS `pg_isready`
  race fix.
- In-place extension upgrades via `ALTER EXTENSION provsql UPDATE` are
  supported starting with this release; upgrade scripts live under
  `sql/upgrades/` and the path is exercised by an automated CI test.

## [1.2.0] - 2026-04-10

This release focuses on providing broader and more consistent support
for SQL language features within provenance tracking.  Systematic
testing across a wide range of query patterns led to numerous bug
fixes, new feature support, and clearer error messages for unsupported
constructs.

### New Features

- **CTE support**: Non-recursive `WITH` clauses now fully track
  provenance.  Nested CTEs (CTE referencing another CTE) and CTEs
  inside `UNION`/`EXCEPT` branches are supported.  Recursive CTEs
  produce a clear error message.

- **`INSERT ... SELECT` provenance propagation**: When both source
  and target tables are provenance-tracked, `INSERT ... SELECT` now
  propagates source provenance to the inserted rows instead of
  assigning fresh tokens.  A warning is emitted when the target table
  lacks a `provsql` column.

- **Correct arithmetic and expressions on aggregate results from
  subqueries**: Explicit casts (`cnt::numeric`), arithmetic
  (`cnt + 1`), window functions (`SUM(cnt) OVER()`), and expressions
  (`COALESCE`, `GREATEST`, etc.) on aggregate results from subqueries
  now produce correct values with a warning, using the original
  aggregate return type from the provenance circuit.

- **UNION ALL with aggregate columns**: `UNION ALL` of queries
  returning aggregate results now works correctly.

### Bug Fixes

- Fixed crash when mixing `COUNT(DISTINCT ...)` with `provenance()` or
  `sr_formula(provenance(), ...)` in the same query.

- Fixed `COUNT(*)` returning NULL instead of `0 (*)` on empty results
  without `GROUP BY`.

- Fixed `provenance_cmp` function failing with "function
  uuid_ns_provsql() does not exist" when `provsql` was not in
  `search_path`.

### Improved Error Messages

- `provenance_evaluate` on unsupported gate types now reports the
  specific gate type and suggests using compiled semirings.

- Subquery errors now read "Subqueries (EXISTS, IN, scalar subquery)
  not supported" instead of the misleading "Subqueries in WHERE
  clause".

- Clear error messages for unsupported operations on aggregate
  results: `DISTINCT` on aggregates, `UNION`/`EXCEPT` (non-ALL) with
  aggregates, `ORDER BY`/`GROUP BY` on aggregate results from
  subqueries.

- Dropped redundant "by provsql" suffix from all error messages (the
  "ProvSQL:" prefix is already present).

### Documentation

- Updated supported/unsupported SQL features list with accurate
  coverage based on systematic testing.

- Added documentation for `INSERT ... SELECT` provenance propagation.

- Expanded aggregation documentation with examples of casts, window
  functions, `COALESCE`, and `GREATEST` on aggregate results.

- Added workaround guidance for unsupported features (use `LATERAL`
  for correlated subqueries, explicit cast for comparison on
  aggregates).

## [1.1.0] - 2026-04-09

### Support for arithmetic on aggregate results

Queries performing arithmetic on aggregate results (e.g.,
`SELECT COUNT(*)+1` or `SUM(id)*10`) are now supported.  Previously,
these queries produced incorrect results because the planner hook
replaced aggregate references with `agg_token` values without
adjusting surrounding operator type expectations.  This is handled by
adding implicit and assignment casts from `agg_token` to standard SQL
types (`numeric`, `double precision`, `integer`, `bigint`, `text`),
and by inserting appropriate type casts during query rewriting when
aggregate results are used inside operators or functions.  A warning
is emitted when provenance information is lost during such
conversions.

### Infrastructure improvements

- Versioned Docker image tagging (images are now tagged with the
  release version in addition to `latest`).
- Improved release process: post-release version bump is now
  automated, and release tarballs exclude non-essential files (CI
  workflows, release script, branding, Docker, and website assets).
- CI fixes for macOS and documentation builds.

## [1.0.0] - 2026-04-05

Initial official release of ProvSQL after 10 years of development.
ProvSQL is now fully documented and usable in production.