# Production Readiness — pgmnemo v0.2.1 Beta

**Version:** v0.2.1  
**Status:** Public Beta  
**Last updated:** 2026-05-09

This page answers four questions directly. No marketing language.

---

## 1. What does "beta" mean here?

**Beta means:** the core retrieval API (`recall_lessons`, `store_lesson`, `traverse_causal_chain`) is stable enough for production evaluation, but we have not yet run a sustained load campaign in a multi-tenant production environment and cannot guarantee forward API stability across minor versions.

Specifically:
- SQL function signatures may change between 0.x minor versions (breaking changes will appear in CHANGELOG with migration SQL).
- GUC names (`pgmnemo.ef_search`, `pgmnemo.recency_weight`, `pgmnemo.tenant_id`) are considered stable for 0.2.x but may be renamed in 0.3.x.
- The upgrade path (`ALTER EXTENSION pgmnemo UPDATE TO '...'`) is tested for sequential upgrades only; skip-version upgrades are not validated.

Beta does **not** mean experimental or unreliable for single-tenant deployments on PG17.

---

## 2. What is tested?

| Area | What we test | Evidence |
|---|---|---|
| **Retrieval accuracy** | LoCoMo (1 982 Q&A pairs, 10 conversations): recall@10 = **0.795**, MRR = 0.548 | [`benchmarks/locomo/results/v0.2.1_session_20260509/report.md`](../benchmarks/locomo/results/v0.2.1_session_20260509/report.md) |
| **Retrieval accuracy** | LongMemEval (500 questions, bge-m3 embedder): recall@10 = **0.933**, MRR = 0.855 | [`benchmarks/longmemeval/results/v0.2.1_pgmnemo_20260509/report.md`](../benchmarks/longmemeval/results/v0.2.1_pgmnemo_20260509/report.md) |
| **Schema correctness** | `make installcheck` on vanilla PG17 Docker (amd64) | CI on every PR |
| **Upgrade path** | Sequential upgrade scripts from 0.1.4 → 0.2.0 → 0.2.0.1 → 0.2.1, idempotent DDL guards | CHANGELOG §Upgrade sections |
| **RLS isolation** | `pgmnemo.tenant_id` GUC policies: tenant A cannot read tenant B rows | Manual verification per INS-032 |
| **Bug regression** | Named regressions for every INS-* fix: IN-param collision (INS-029), numeric cast (INS-030), idempotent DDL (INS-031) | CHANGELOG v0.2.0.1, v0.2.1 |
| **Cycle guard** | `traverse_causal_chain` cycle detection via path array, all three direction modes | Unit test in `extension/sql/test_traverse.sql` |
| **EF search GUC** | `pgmnemo.ef_search` applied at `recall_lessons()` entry, clamped 10–500 | CHANGELOG v0.2.1 |

**Embedder note:** All benchmark numbers use retrieval-only mode. No LLM-as-judge downstream evaluation has been run yet (see §3).

---

## 3. What is not yet guaranteed?

| Gap | Detail |
|---|---|
| **LLM-as-judge / end-to-end QA accuracy** | We report retrieval recall@K only. Downstream answer quality (the metric competitors report as "LLM-judge accuracy") is not yet measured for pgmnemo. |
| **PG14–16 compatibility** | Install and upgrade scripts work on PG14–16 in informal testing; numeric cast fix (INS-030) was the only known PG14 regression. Formal `installcheck` CI does not run on PG14–16. |
| **Sustained load / p99 latency at scale** | No stress-test or sustained load campaign has been run. The `US-A2` acceptance criterion (≤40 ms p95 on 10K entries) is a design target, not a validated result. |
| **`arm64` prebuilt binary** | Source build works on arm64; prebuilt `.so` for arm64 is not yet distributed. |
| **Skip-version upgrades** | Upgrading from 0.1.x directly to 0.2.1 (skipping intermediate versions) is untested. |
| **Multi-tenant RLS under adversarial load** | RLS policies have been reviewed for correctness but not fuzz-tested or audited by a third party. |
| **Recency weight calibration** | `pgmnemo.recency_weight` default lowered from 0.20 → 0.08 in v0.2.1 pending REC-1 ablation study. The ablation has not been published; the current default is a provisional best estimate. |

---

## 4. What must a production adopter verify on their side?

Before running pgmnemo in a workload that matters, verify the following:

1. **Run `make installcheck` against your target PG version.**  
   If your PG version is not 17, run the test suite explicitly. PG14–16 deviations will surface here.

2. **Smoke-test the upgrade path from your current version.**  
   Run each `ALTER EXTENSION pgmnemo UPDATE TO '...'` step sequentially in a staging environment before applying to production.

3. **Validate RLS with your tenant ID scheme.**  
   Set `pgmnemo.tenant_id` and confirm cross-tenant queries return empty results. Do not rely on application-layer filtering alone.

4. **Measure your own p95 latency on your corpus size.**  
   Index your `agent_lesson` table with HNSW before load. Tune `pgmnemo.ef_search` (default 100) for your recall/latency tradeoff. The ≤40 ms p95 target was not benchmarked on real hardware.

5. **Pin the extension version in your migration scripts.**  
   Use `ALTER EXTENSION pgmnemo UPDATE TO '0.2.1'` explicitly, not `UPDATE` (latest). Minor version API changes are documented but will not be held back for you.

6. **Do not rely on LLM-as-judge accuracy numbers from competitor papers.**  
   pgmnemo v0.2.1 publishes retrieval recall only. If your application needs QA accuracy guarantees, you must run your own end-to-end evaluation.

---

*Honest assessment, not a sales page. If you find a gap not listed here, open an issue.*