# Changelog All notable changes to `pgmnemo` are documented here. Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). --- ## [0.4.0] β€” 2026-05-15 ### Theme Hybrid retrieval promoted to default β€” significant lift on conversational memory workloads (LoCoMo), neutral on dense multi-doc retrieval (LongMemEval). ### Bench verdict Real-DB benchmarks via the new router (`benchmarks/gate/v0.4.0.json`) vs v0.3.0 baseline: | Bench / scope | Metric | v0.3.0 | v0.4.0 | Ξ”pp | p_corr | Verdict | |---|---|---|---|---|---|---| | LoCoMo session OVERALL | recall@5 | 0.6623 | 0.7230 | **+6.07** | 0.0010 | 🟒 IMPROVED | | LoCoMo session OVERALL | recall@10 | 0.7951 | 0.8409 | **+4.15** | 0.0156 | 🟒 IMPROVED | | LoCoMo session OVERALL | MRR | 0.5569 | 0.6365 | **+7.96** | <0.0001 | 🟒 IMPROVED | | LoCoMo session open_domain | recall@5 | 0.7176 | 0.7907 | **+7.31** | 0.0148 | 🟒 IMPROVED | | LoCoMo session open_domain | MRR | 0.5688 | 0.6667 | **+9.79** | 0.0009 | 🟒 IMPROVED | | LongMemEval OVERALL | recall@10 | 0.9334 | 0.9334 | +0.00 | 1.0000 | neutral | | LongMemEval OVERALL | MRR | 0.8472 | 0.8521 | +0.49 | 1.0000 | neutral | | LoCoMo segment | (all) | (unchanged) | (unchanged) | 0.00 | 1.0000 | neutral | 5 significant improvements, 0 regressions across 24 LoCoMo session cells. LongMemEval and LoCoMo segment hold steady β€” hybrid doesn't trigger when query_text is NULL or when dense retrieval is already saturated (bge-m3 on LME). Honest scope ([COMPETITIVE_REALITY.md](docs/COMPETITIVE_REALITY.md) updated): - βœ… Significant lift on conversational dialog retrieval (LoCoMo paper-canonical) - βœ… No regression on any benchmark - ❌ Does NOT close the BM25 gap on LongMemEval (BM25=0.982, pgmnemo=0.9334) - ⚠️ v0.2.2 simulation predicted +12.7pp lift; real-DB measured +4.15pp on LoCoMo and +0.00pp on LongMemEval β€” sim overstated by 3-100x. New `docs/WORKFLOW.md Β§2.2` "PROVE BEFORE ADD" rule caught this before promotion. ### Added - **Hybrid retrieval as default for `recall_lessons()`** β€” when `query_text` is non-empty AND `query_embedding` is non-NULL AND `pgmnemo.disable_hybrid` GUC is FALSE/unset, internally routes to `recall_hybrid()` with default weights (vec_weight=0.4, bm25_weight=0.4, rrf_k=60). Signature unchanged; output shape unchanged (12 columns); diagnostic `vec_score`/`bm25_score`/ `rrf_score` columns exposed only via direct `recall_hybrid()` call. - **`pgmnemo.disable_hybrid` GUC** β€” `SET pgmnemo.disable_hybrid = 'true'` restores strict v0.3.0 vector-only behaviour. Default FALSE. - **`lesson_tsv` column + GIN index + auto-populating trigger** β€” moved from v0.2.2 EXPERIMENTAL opt-in to default extension install. - **`recall_hybrid()` function** β€” moved from v0.2.2 opt-in to default install. Signature unchanged from v0.2.2. - **`scripts/smoke_recall_hybrid.py`** β€” CI signature-stability smoke test (catches output-column rename bugs in ~10s vs ~5min bench script failure). - **`benchmarks/scripts/bench_embed_cache.py`** β€” embedding cache (deterministic for `(text, model, max_seq)`). Reduces LongMemEval bench from ~52 min cold to ~3 min cached, LoCoMo from ~10 min to 14 seconds. Unlocks practical weight-tuning grid searches. ### Changed - `docs/SQL_REFERENCE.md Β§2.5` β€” fixed incorrect documentation of `recall_hybrid()` output schema (was `hybrid_score`, actually `score`). Added `vec_score`, `bm25_score`, `rrf_k` parameter, and explicit "sort by score" guidance. - `docs/COMPETITIVE_REALITY.md Β§1.2` updated β€” BM25 gap on LongMemEval remains (0.982 vs 0.9334); v0.4.0 does NOT close it. Future work via Stella V5 embedder (H-02) or workload-aware routing (H-future). - `docs/COMPETITIVE_REALITY.md Β§5` updated β€” graph-feature deprecation deferred to v0.4.1 (separate cycle). ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.4.0'; ``` Idempotent. Adopters who need strict v0.3.0 retrieval behaviour can opt out: ```sql SET pgmnemo.disable_hybrid = 'true'; -- or persist: ALTER SYSTEM SET pgmnemo.disable_hybrid = 'true'; SELECT pg_reload_conf(); ``` ### CI / release-gate verdict `scripts/significance_test_extended.py` exit 3 (NEAR_THRESHOLD) due to 14 near-threshold cells on LoCoMo session (all positive direction). Release notes include monitor watchlist for v0.4.1 follow-up: - `multi_hop/MRR`: +9.55pp (p_corr=0.2949, n=321, may reach significance with v0.4.1 data) - `multi_hop/recall@5`: +7.79pp (p_corr=0.6874) - `adversarial/MRR`: +7.14pp (p_corr=0.7715) - `single_hop/recall@10`: +3.62pp (positive but small n) - `temporal/*`: small positive trends across 3 metrics (historically weakest category, watch) --- ## [0.3.1] β€” 2026-05-13 ### Theme Hygiene foundation β€” no recall-algorithm change. The release that closes the documentation/process gaps surfaced by the v0.2.x β†’ v0.3.0 audit (see `docs/WORKFLOW.md Β§1` for the post-mortem). ### Bench verdict Per `scripts/significance_test_extended.py` against the v0.3.0 baseline (`benchmarks/gate/v0.3.0.json`): **neutral on all 3 benches** β€” no SQL change, no recall delta possible. The release is hygiene-only. ### Added - **`docs/WORKFLOW.md`** β€” canonical development discipline document. Defines customer-first hypothesis declaration, per-cell bench gate, deprecation by absence of evidence, and 2–4 week release cycle. - **`docs/BENCHMARK_PROTOCOL.md`** β€” two-phase architecture (corpus snapshot reuse + per-version retrieval test), frozen parameters table, gate decision matrix, CI integration plan. - **`docs/SQL_REFERENCE.md`** β€” every public SQL function (version, ingest, recall_lessons, recall_lessons_pooled, recall_hybrid, traverse_causal_chain, traverse_temporal_window), all GUCs, RLS behaviour, deprecation log. - **`docs/MIGRATION.md` Part B** β€” in-place version-to-version upgrade paths v0.1.x β†’ v0.3.0, per-version backfill requirements, generic dump+restore rollback policy. - **`benchmarks/METRICS_BY_VERSION.md`** β€” single source of truth for "which version produced which number." Per-(dataset Γ— embedder Γ— mode) tables, append-only at every release. - **`benchmarks/gate/`** β€” release pre-push snapshot files (`v.json`) that consolidate every real-DB metrics.json for a release; CI uses these for the mechanical gate decision. - **`scripts/significance_test_extended.py`** β€” per-category z-test with Holm-Bonferroni correction; exit codes 0/1/2/3 drive the release gate. - **`scripts/render_progression.py`** β€” pure-SVG per-bench small-multiples line charts with CI95 bands. - **`scripts/render_full_history.py`** β€” Tufte-style sparkline table with all metrics Γ— all versions. - **`scripts/render_executive_scorecard.py`** β€” single-page PASS/WATCH/FAIL scorecard for non-technical readers. - **`ROADMAP.md` v2** β€” customer-driven per-version plan to v1.0. Old spec-driven roadmap archived. - **CI bench-gate** β€” `.github/workflows/release.yml` blocks tag push when `benchmarks/gate/v.json` is missing or significance test exits 2. Soft check in `ci.yml` warns PRs that touch SQL but don't update the gate. ### Fixed - GitHub Issues #12 (release coherence), #13 (docs/API coherence), #14 (install/upgrade contract), #15 (migration guide), #16 (benchmark protocol) β€” all closed; see commits in the v0.3.1 cycle. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.3.1'; ``` Empty upgrade script (no SQL changes); the version bump tracks documentation/process improvements only. --- ## [0.3.0] β€” 2026-05-10 ### Fixed - **P0: `edge_type` β†’ `relation_type` in migration S3 backfill** β€” the `UPDATE pgmnemo.mem_edge SET edge_kind = ...` backfill in the 0.2.1β†’0.3.0 migration script referenced a non-existent column `edge_type` instead of the correct `relation_type` column. On any database with existing `mem_edge` rows this caused `ERROR: column "edge_type" does not exist`, preventing the migration from completing. Fixed: all `edge_type` references in S3 replaced with `relation_type`. - **P0: `edge_type` β†’ `relation_type` in `traverse_causal_chain()` S8** β€” the recreated `traverse_causal_chain()` function in S8 of the migration also referenced `me.edge_type` in the WHERE clause (both forward and backward BFS branches). Fixed: all `edge_type` references replaced with `relation_type` in S8. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.3.0'; ``` --- ## [0.2.2] β€” 2026-05-10 (candidate) ### Added - **`pgmnemo.recall_hybrid()` β€” vector + BM25 weighted fusion** (**EXPERIMENTAL β€” opt-in only, NOT default**) β€” new function combining dense cosine retrieval with BM25-class sparse retrieval (`ts_rank_cd` on `lesson_tsv`). Formula: `0.4Γ—cosine + 0.4Γ—ts_rank_cd(lesson_tsv, q, 32)` (plus minor importance/recency/provenance components). Union retrieval: candidates matched by **either** embedding cosine **or** BM25 text match. Returns `rrf_score` diagnostic column (`1/(rrf_k+vec_rank) + 1/(rrf_k+bm25_rank)`). `recall_lessons()` is unchanged and remains the default. **Bench results (simulation, 2026-05-10):** LoCoMo recall@10 +12.7pp vs vector-only (all 5 question types positive, statistically significant, CIs disjoint); LongMemEval MRR +5.8pp (p=0.005, significant), recall@10 +1.5pp (p=0.308, not significant β€” within noise at high baseline 0.93). Try `recall_hybrid()` if your task is MRR-sensitive or your corpus has keyword-matchable queries alongside semantic ones. WG decision: `spec/v2/pgmnemo/HYBRID_DECISION_2026-05-10.md`. - **Migration script** `extension/pgmnemo--0.2.1--0.2.2-hybrid.sql` β€” idempotent `CREATE OR REPLACE FUNCTION`, backward-compatible (existing `recall_lessons()` unchanged). - **Benchmark script** `benchmarks/scripts/run_longmemeval_hybrid.py` β€” LongMemEval evaluation harness for `recall_hybrid()` with gap-analysis reporting vs. vector-only and BM25 baselines. ### Upgrade ```sql \i extension/pgmnemo--0.2.1--0.2.2-hybrid.sql ``` --- ## [0.2.1] β€” 2026-05-09 ### Added - **`traverse_causal_chain(direction)` parameter** (W2.2 / F5) β€” adds `direction TEXT DEFAULT 'forward'` parameter. `'forward'` follows sourceβ†’target edges (existing behaviour, backward-compatible). `'backward'` follows targetβ†’source edges for reverse traversal. `'both'` traverses all edges. Input validation raises `EXCEPTION` on invalid values. Cycle guard via path array applies to all directions. - **`pgmnemo.ef_search` GUC** (F2) β€” `SET LOCAL pgvector.hnsw.ef_search` applied at `recall_lessons()` entry from `pgmnemo.ef_search` GUC (default 100, clamped 10–500). - **Graph-proximity mixin in standard upgrade path** (F3) β€” `pgmnemo--0.2.0-step4-recall-mixin.sql` content folded into the v0.2.0.1β†’0.2.1 upgrade script (was supplemental-only). - **Row-Level Security multi-tenant isolation** (W2.3 / Q5) β€” `pgmnemo.tenant_id` GUC gates `agent_lesson` by `project_id` and `mem_edge` by endpoint ownership. Empty/unset = service-account bypass. Policies are idempotent (DROP IF EXISTS before CREATE). ### Fixed - **`recall_lessons()` IN-param/RETURNS TABLE collision on `project_id`** (INS-029 v2) β€” IN-param `project_id INT` collided with the `RETURNS TABLE` column of the same name. Fix: IN-param renamed to `project_id_filter`; all internal `recall_lessons.project_id` references updated accordingly. Backport of the same pattern as the `role`β†’`role_filter` fix in v0.1.4.1/v0.2.0.1. ### Changed - **`pgmnemo.recency_weight` default lowered** (F1) β€” from `0.20` to `0.08` (pending REC-1 ablation confirmation). Operator can override via `ALTER SYSTEM SET pgmnemo.recency_weight = ''; SELECT pg_reload_conf();`. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.2.1'; ``` --- ## [0.2.0.1] β€” 2026-05-04 ### Fixed - **`traverse_temporal_window()` numeric β†’ double precision cast** (INS-030) β€” comparison between a `NUMERIC` intermediate value and `DOUBLE PRECISION` caused a type-mismatch error at runtime on PostgreSQL 14/15. Cast now explicit throughout the function body. - **`recall_lessons()` IN-param/RETURNS TABLE collision** (INS-029) β€” IN-param `role` renamed to `role_filter` (backport of v0.1.4.1 fix; see below). - **Idempotent upgrade DDL** (INS-031) β€” `ADD COLUMN IF NOT EXISTS` and `CREATE INDEX IF NOT EXISTS` guards added across all `0.1.4β†’0.2.0` upgrade scripts (backport of v0.1.4.1 fix). - **`recall_lessons_pooled()` post-collision smoke** (Action #7) β€” confirmed pooled wrapper correctly delegates to the renamed `role_filter` parameter after the collision fix. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.2.0.1'; ``` --- ## [0.1.4.1] β€” 2026-05-04 (maintenance branch) ### Fixed - **`recall_lessons()` IN-param/RETURNS TABLE collision** (INS-029, P0) β€” PL/pgSQL raised `ERROR: parameter name "role" used more than once` because the IN-param `role TEXT` collided with the `RETURNS TABLE` column of the same name. The flagship function never compiled on a fresh install. Fix: IN-param renamed to `role_filter`; all callers updated. - **Idempotent upgrade DDL** (INS-031) β€” `ADD COLUMN IF NOT EXISTS` and `CREATE INDEX IF NOT EXISTS` guards applied across all upgrade scripts so re-running a patch on an already-upgraded database no longer raises duplicate-object errors. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.1.4.1'; ``` --- ## [0.2.0] β€” 2026-05-04 ### Added - **`pgmnemo.mem_edge` DDL** (closes RFC Β§3) β€” directed typed edge table between `agent_lesson` rows. - Columns: `source_id`, `target_id`, `relation_type` (CAUSED_BY, SUPERSEDES, CO_OCCURRED, DERIVED_FROM, or user-defined), `weight REAL` [0.0–1.0], bitemporality (`valid_from`/`valid_until`), `commit_sha`, `metadata JSONB`. - Three covering indexes: forward/reverse traversal on `(source_id, relation_type)` and `(target_id, relation_type)` with `WHERE valid_until IS NULL`; temporal range index on `(valid_from, valid_until)`. - `CONSTRAINT ck_no_self_loop`: prevents `source_id = target_id`. - **`pgmnemo.traverse_causal_chain(start_id, max_depth, relation_types, only_active)`** (closes RFC Β§4) β€” recursive CTE walk of the causal edge graph. - Returns `(lesson_id, depth, path BIGINT[], path_weight, role, topic, lesson_text, importance, created_at, commit_sha, verified_at)`. - Cycle-safe via accumulated path array. Fail-safe: returns zero rows if `start_id` missing. - `max_depth` default 5; `relation_types` default `ARRAY['CAUSED_BY']`; `only_active` default `TRUE`. - **`pgmnemo.traverse_temporal_window(start_id, window_interval, include_unlinked, role_filter, project_id_filter, k)`** (closes RFC Β§5) β€” co-temporal episode discovery. - Returns lessons whose `created_at` falls within `Β±window_interval` of the anchor lesson. Window hard-capped at 30 days. - `linked=TRUE` when a `mem_edge` (either direction) exists between the row and `start_id`. - Ghost-lesson exclusion controlled by `pgmnemo.include_unverified` GUC (default off). - **Graph-proximity mixin for `recall_lessons()`** β€” integrates BFS graph traversal into scoring. - Updated scoring formula: `0.4Γ—cosine + 0.2Γ—importance + Ξ³Γ—recency + 0.1Γ—prov_strength + δ×graph_proximity`. - `graph_proximity = MAX(1 - depth/max_depth)` via BFS through `CAUSED_BY`, `CO_OCCURRED`, `DERIVED_FROM` edges from top-5 cosine anchors (max_depth=5). - New GUC `pgmnemo.graph_proximity_weight` (default `0.2`, clamped to `[0.0, 0.5]`). ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.2.0'; ``` Or from a fresh install: ```sql CREATE EXTENSION pgmnemo CASCADE; -- installs 0.2.0 directly ``` --- ## [0.1.4] β€” 2026-05-04 ### Added - **State machine for `agent_lesson`** (closes [#3](https://github.com/pgmnemo/pgmnemo/issues/3)) - New `state TEXT` column (default `'draft'`), constrained to 9 lifecycle values: `draft`, `candidate`, `validated`, `canonical`, `deprecated`, `superseded`, `archived`, `rejected`, `conflicted`. - `state_changed_at TIMESTAMPTZ` β€” auto-set on every state change. - `pgmnemo.agent_lesson_state_transition` table β€” explicit allowed-transition pairs. - `pgmnemo.transition_lesson(lesson_id BIGINT, new_state TEXT)` β€” enforces the DAG; raises on invalid transition. - **Provenance FK columns** (closes [#4](https://github.com/pgmnemo/pgmnemo/issues/4)) - `source_run_id BIGINT NULL` β€” soft FK to the orchestrator `agent_run` row that produced this lesson. - `source_task_id BIGINT NULL` β€” soft FK to the orchestrator `tasks` row. - Partial indexes `ix_pgmnemo_lesson_source_run` and `ix_pgmnemo_lesson_source_task` (WHERE NOT NULL). - Columns are intentionally not hard `REFERENCES`-constrained so the extension remains portable across host schemas. - **TTL / `expires_at`** (closes [#5](https://github.com/pgmnemo/pgmnemo/issues/5)) - `expires_at TIMESTAMPTZ NULL` β€” optional hard expiry; `NULL` = never expires. - `pgmnemo.evict_expired_lessons()` β€” deletes rows where `expires_at < NOW()`; returns eviction count. Safe to call on a schedule. - Partial index `ix_pgmnemo_agent_lesson_expires` keeps eviction scans cheap. ### Fixed - **`pgmnemo.version()` dynamic lookup** (closes [#1](https://github.com/pgmnemo/pgmnemo/issues/1)) - `version()` previously returned a hard-coded string baked at build time. After `ALTER EXTENSION pgmnemo UPDATE` the reported version was stale. - Now reads `extversion` from `pg_catalog.pg_extension` at call time β€” always accurate. ### Upgrade ```sql ALTER EXTENSION pgmnemo UPDATE TO '0.1.4'; ``` Or from a fresh install: ```sql CREATE EXTENSION pgmnemo CASCADE; -- installs 0.1.4 directly ``` --- ## [0.1.3] β€” 2026-04-29 ### Added - `verifier_role TEXT` column on `agent_lesson` β€” records which agent role validated the lesson. --- ## [0.1.2] β€” 2026-04-28 ### Added - Tri-state `prov_strength` (`hard` / `soft` / `none`) on `agent_lesson`. - `recall_lessons_pooled()` wrapper β€” cross-project recall for shared-context queries. --- ## [0.1.1] β€” 2026-04-27 ### Added - `recency_weight` GUC β€” tune the time-decay component of the hybrid recall score without restarting the server. --- ## [0.1.0] β€” 2026-04-26 ### Added - HNSW vector index via `pgvector` β€” fast approximate nearest-neighbour recall. - `pgmnemo.ingest()` β€” provenance-gated write API; requires `commit_sha` or `artifact_hash`. - `pgmnemo.recall_lessons()` β€” hybrid scoring: cosine similarity + BM25 full-text + recency decay. - Role + `project_id` composite scoping. - `recall_lessons_pooled()` (cross-project variant). --- ## [0.0.1] β€” 2026-04-20 Initial schema: `pgmnemo.agent_lesson` table + basic HNSW index.