# pgmnemo Usage Guide

## Writing lessons — `pgmnemo.ingest()`

`pgmnemo.ingest()` is the validated write path. Use it instead of raw `INSERT` to get:

- embedding dimension validation (1024 required)
- automatic `verified_at` stamp when provenance fields are present
- provenance gate enforcement (controlled by `pgmnemo.gate_strict`)

### Signature

```sql
pgmnemo.ingest(
    p_role          TEXT,
    p_project_id    INT,
    p_topic         TEXT,
    p_lesson_text   TEXT,
    p_importance    SMALLINT  DEFAULT 3,       -- 1 (low) to 5 (critical)
    p_embedding     vector(1024) DEFAULT NULL,
    p_commit_sha    TEXT      DEFAULT NULL,
    p_artifact_hash TEXT      DEFAULT NULL,
    p_metadata      JSONB     DEFAULT '{}'
) RETURNS BIGINT  -- new lesson id
```

### Examples

```sql
-- Minimal: text-only lesson with provenance via commit SHA
SELECT pgmnemo.ingest(
    'developer', 1, 'security',
    'Rotate JWT secrets within 24 hours of any key-compromise indicator.',
    4,
    NULL,       -- no embedding; text-only recall still works
    'a3f9b12'   -- commit SHA from the agent run that produced this lesson
);

-- Full: with embedding + artifact hash (e.g. signed test report)
SELECT pgmnemo.ingest(
    p_role          := 'qa-agent',
    p_project_id    := 7,
    p_topic         := 'flaky-tests',
    p_lesson_text   := 'Test suite_login is flaky under high concurrency; add retry=3.',
    p_importance    := 3,
    p_embedding     := <your_vector_1024>,
    p_artifact_hash := 'sha256:e3b0c44298fc1c149afb...',
    p_metadata      := '{"model": "claude-sonnet-4-6", "run_id": "r-42"}'
);
```

**Provenance gate behaviour** (set by `pgmnemo.gate_strict`):

- `enforce` (default) — call fails if both `p_commit_sha` and `p_artifact_hash` are NULL
- `warn` — call succeeds; client receives a `WARNING`; `verified_at` remains NULL
- `off` — no check; use only for development

```sql
-- Temporarily relax for bulk backfill:
SET pgmnemo.gate_strict = 'warn';
```

---

## Reading lessons — `pgmnemo.recall_lessons()`

### Signature

```sql
pgmnemo.recall_lessons(
    query_embedding  vector(1024),  -- pass NULL for text-only recall
    k                INT     DEFAULT 10,
    role             TEXT    DEFAULT NULL,
    project_id       INT     DEFAULT NULL,
    query_text       TEXT    DEFAULT NULL
) RETURNS TABLE (
    lesson_id     BIGINT,
    score         DOUBLE PRECISION,
    role          TEXT,
    project_id    INT,
    topic         TEXT,
    lesson_text   TEXT,
    importance    SMALLINT,
    metadata      JSONB,
    commit_sha    TEXT,
    artifact_hash TEXT,
    verified_at   TIMESTAMPTZ,
    created_at    TIMESTAMPTZ
)
```

### Scoring formula (paper §6.4, locked)

```
score = 0.5 × cosine_similarity
      + 0.2 × (importance / 5)
      + 0.2 × recency_decay(90 days)
      + 0.1 × provenance_strength
```

Where `provenance_strength` = 1.0 (commit + verified), 0.5 (commit only), 0.0 (none).

By default, rows with `verified_at IS NULL` (ghost lessons) are excluded. Enable with:

```sql
SET pgmnemo.include_unverified = 'true';
```

### Examples

```sql
-- Semantic recall with role filter
SELECT topic, lesson_text, score
FROM pgmnemo.recall_lessons(
    <your_vector_1024>,
    10,
    'developer'    -- role filter; NULL = all roles
);

-- Text-only recall (no embedding)
SELECT topic, lesson_text, score
FROM pgmnemo.recall_lessons(
    NULL::vector(1024),
    5,
    NULL,          -- all roles
    42,            -- project_id filter
    'JWT rotation' -- full-text query
);

-- Hybrid: embedding + text + project scope
SELECT topic, lesson_text, score
FROM pgmnemo.recall_lessons(
    <your_vector_1024>,
    20,
    'security-agent',
    42,
    'key rotation best practices'
);
```

---

## Edge taxonomy — `edge_kind` ENUM (v0.3.0)

v0.3.0 introduces a typed edge taxonomy as part of MAGMA §3. Each `mem_edge` row now carries
a mandatory `edge_kind` column drawn from the ENUM `pgmnemo.edge_kind`.

### `edge_kind` values

| Value | Meaning |
|-------|---------|
| `semantic` | Conceptually related lessons (shared topic or entity) |
| `temporal` | Lessons from overlapping or adjacent time windows |
| `causal` | Lesson A is a cause or precondition for lesson B |
| `entity` | Lessons share a named entity (agent, project, artifact) |

### Migration note (upgrading from v0.2.1)

The v0.2.1→v0.3.0 migration (`pgmnemo--0.2.1--0.3.0.sql`) backfills `edge_kind` from the
existing `relation_type` TEXT column using the mapping:

```
CAUSED_BY / caused_by / causal / derives_from / DERIVED_FROM / contradicts  → causal
CO_OCCURRED / co_occurred / temporal                                          → temporal
DERIVED_FROM / derived_from                                                   → semantic (fallback)
(all others)                                                                  → semantic
```

After migration, `edge_kind` is `NOT NULL` on all rows. The original `relation_type` column
is preserved as a freeform annotation column.

### Per-kind partial indexes

Four partial B-tree indexes are created automatically:

```sql
pgmnemo_mem_edge_semantic_idx   ON mem_edge (lesson_a_id, lesson_b_id)  WHERE edge_kind = 'semantic'
pgmnemo_mem_edge_temporal_idx   ON mem_edge (lesson_a_id, lesson_b_id)  WHERE edge_kind = 'temporal'
pgmnemo_mem_edge_causal_idx     ON mem_edge (lesson_a_id, lesson_b_id)  WHERE edge_kind = 'causal'
pgmnemo_mem_edge_entity_idx     ON mem_edge (lesson_a_id, lesson_b_id)  WHERE edge_kind = 'entity'
```

Queries that filter by `edge_kind` (e.g. causal-chain traversal) benefit from index-only scans.

### BFS fix in `recall_lessons()`

v0.3.0 fixes a bug where the BFS step inside `recall_lessons()` referenced the deprecated
`edge_type` column. The BFS now correctly uses `edge_kind` for graph traversal. This change
is transparent — the `recall_lessons()` signature is unchanged.

### Writing edges

```sql
-- Add a causal edge between two lessons
INSERT INTO pgmnemo.mem_edge (lesson_a_id, lesson_b_id, edge_kind, relation_type)
VALUES (1001, 1002, 'causal', 'CAUSED_BY');

-- Add a temporal co-occurrence edge
INSERT INTO pgmnemo.mem_edge (lesson_a_id, lesson_b_id, edge_kind, relation_type)
VALUES (1003, 1004, 'temporal', 'CO_OCCURRED');
```

---

## Hybrid retrieval — `pgmnemo.recall_hybrid()` ⚠ EXPERIMENTAL

> **EXPERIMENTAL — opt-in only.** `recall_hybrid()` is NOT the default retrieval path.
> Call it directly when you need it. `recall_lessons()` is unchanged.
>
> Bench status (2026-05-10, simulation): LoCoMo recall@10 +12.7pp vs vector-only (all
> question types positive, statistically significant). LongMemEval MRR +5.8pp (p=0.005,
> significant); recall@10 +1.5pp (p=0.308, not significant). Numbers are simulation
> (TF-IDF proxy for dense retrieval); real-DB confirmation pending.

Combines dense cosine retrieval with BM25-class sparse matching. Best suited for tasks
where the correct memory is lower in the top-K ranking (MRR improvement) or where
keyword-match queries appear alongside semantic queries (LoCoMo-style mixed corpus).

### Signature

```sql
pgmnemo.recall_hybrid(
    query_embedding  vector(1024),
    query_text       TEXT,
    k                INT     DEFAULT 10,
    role_filter      TEXT    DEFAULT NULL,
    project_id_filter INT    DEFAULT NULL,
    vec_weight       FLOAT   DEFAULT 0.4,
    bm25_weight      FLOAT   DEFAULT 0.4
) RETURNS TABLE (
    lesson_id     BIGINT,
    hybrid_score  DOUBLE PRECISION,
    rrf_score     DOUBLE PRECISION,   -- diagnostic: 1/(k+vec_rank) + 1/(k+bm25_rank)
    role          TEXT,
    project_id    INT,
    topic         TEXT,
    lesson_text   TEXT,
    importance    SMALLINT,
    metadata      JSONB,
    commit_sha    TEXT,
    artifact_hash TEXT,
    verified_at   TIMESTAMPTZ,
    created_at    TIMESTAMPTZ
)
```

Formula: `hybrid_score = vec_weight×cosine + bm25_weight×ts_rank_cd(lesson_tsv, q, 32)`  
Union retrieval: candidates matched by **either** embedding cosine **or** BM25.

### Example

```sql
-- Opt-in: call recall_hybrid() directly
SELECT topic, lesson_text, hybrid_score, rrf_score
FROM pgmnemo.recall_hybrid(
    <your_vector_1024>,
    'JWT rotation key compromise',
    10,
    'security-agent',  -- role filter
    42                 -- project_id filter
);
```

### When to use

- Task requires ranking the correct result higher in top-K (MRR-sensitive)
- Your memory corpus has both keyword-matchable and semantic queries (LoCoMo profile)
- You have confirmed the recall@10 signal on your own data before relying on it

### Install

```sql
-- Run once after upgrading to v0.2.2:
\i extension/pgmnemo--0.2.1--0.2.2-hybrid.sql
```

---

## Tuning

### HNSW ef_search

Higher `ef_search` improves recall accuracy at the cost of query latency. Default is 40.

```sql
-- Per-session (no restart required)
SET hnsw.ef_search = 100;
```

Rule of thumb: start at 40 for latency-sensitive paths, raise to 100–200 when recall
accuracy matters more than p99 latency.

### HNSW index parameters

The index is built with `m=16, ef_construction=64` (extension defaults). To rebuild with
higher-quality construction (slower build, better recall at low ef_search):

```sql
-- Requires pgvector 0.7+
REINDEX INDEX pgmnemo.pgmnemo_agent_lesson_embedding_idx;
-- Or drop/recreate with custom params:
DROP INDEX pgmnemo.pgmnemo_agent_lesson_embedding_idx;
CREATE INDEX pgmnemo_agent_lesson_embedding_idx
  ON pgmnemo.agent_lesson
  USING hnsw (embedding vector_cosine_ops)
  WITH (m=32, ef_construction=128)
  WHERE is_active AND embedding IS NOT NULL;
```

### Scoring weight overrides

The §6.4 formula weights are hardcoded in v0.1.0. Custom scoring is supported by calling
`recall_lessons()` and applying your own reranker in application code, or by wrapping the
returned columns in a custom SQL query.

### Limiting ghost lessons

Keep `pgmnemo.gate_strict = 'enforce'` (default) in production. Ghost lessons (unverified
rows) dilute recall quality because they score 0 on the provenance component and may contain
hallucinated observations.