# Memory Model
The engine is designed around cache-friendly owned arrays during build and
read-only mmap-backed fixed-width arrays after persistence. PostgreSQL backends
do not share Rust heap memory. After auto-load, they can share the immutable
fixed-width persisted base graph pages through the operating system page cache,
while derived and variable-size structures remain per-backend heap allocations.
This mapping is for rebuildable graph artifacts only; PostgreSQL still owns
authoritative table storage, WAL, MVCC, durability, and crash recovery.
## Backend Ownership
- `node_store` -> mmap pointers
- `edge_store` -> mmap pointers
- `resolution_store` -> mmap section
- `_mmap handle`
- OS page cache pages
- **FilterIndex** — bincode section deserialized per backend
- **edge_type_registry** — bincode section deserialized per backend
- **reverse_edge_store** — derived owned CSR for inbound traversal
- **resolution_delta** — indexed post-load sync inserts
- **edge_buffer** — post-load sync edge overlays
- **tenant_membership** — backend-local state
The forward base graph is not copied wholesale into each connection's private
heap after mmap load. Node arrays, forward CSR arrays, primary-key bytes, and
resolution bytes are mmap-backed. The reverse CSR and bincode sections are
currently per-backend heap structures.
## NodeStore
`NodeStore` uses a struct-of-arrays layout:
or [u32]', desc: 'source table OID by node index' },
{ name: 'primary_keys', type: 'Vec or offsets+bytes', desc: 'source primary key by node index' }
]} />
Owned mode supports mutation. Mmap mode is read-only and uses validated raw
pointers into `Engine._mmap`. In the persisted format, active bits, table OIDs,
primary-key offsets, and primary-key bytes are all mmap-backed.
## EdgeStore
`EdgeStore` uses compressed sparse row:
or [u32]', desc: 'length node_count + 1; offsets into target arrays' },
{ name: 'targets', type: 'Vec or [u32]', desc: 'neighbor node indices' },
{ name: 'type_ids', type: 'Vec or [u8]', desc: 'parallel edge label IDs' },
{ name: 'weights', type: 'Vec or [u32]', desc: 'optional parallel edge weights' }
]} />
CSR neighbor lookup:
```text
node i neighbors = targets[edge_offsets[i]..edge_offsets[i + 1]]
node i labels = type_ids[edge_offsets[i]..edge_offsets[i + 1]]
node i weights = weights[edge_offsets[i]..edge_offsets[i + 1]]
```
CSR invariants:
| Invariant | Enforced by |
|---|---|
| `edge_offsets.len() == node_count + 1` | Builders and loader validation |
| `edge_offsets[0] == 0` | Loader validation |
| Offsets are monotonic | Loader validation |
| Final offset equals `edge_count` | Loader validation |
| Targets are less than `node_count` | Builders and loader validation |
| `type_ids.len() == targets.len()` | Builders and section sizes |
| `weights` empty or length `edge_count` | Builders and loader validation |
## Loaded Artifact Memory Split
| Structure | After `load_graph_file()` |
|---|---|
| `NodeStore.is_active` | mmap-backed |
| `NodeStore.table_oids` | mmap-backed |
| `NodeStore.primary_key_offsets` and bytes | mmap-backed |
| Forward `EdgeStore.edge_offsets` | mmap-backed |
| Forward `EdgeStore.targets` | mmap-backed |
| Forward `EdgeStore.type_ids` | mmap-backed |
| Forward `EdgeStore.weights` | mmap-backed when present |
| `ResolutionIndex` | mmap-backed section |
| `FilterIndex` | bincode payload deserialized into backend heap |
| `edge_type_registry` | bincode payload deserialized into backend heap |
| `reverse_edge_store` | built as owned heap from forward edges |
| Sync overlays | backend-local heap |
## FilterIndex Storage
`FilterIndex` stores registered traversal filter columns by internal `node_idx`.
It chooses dense or sparse storage based on build-time populated count.
```text
FilterIndex
columns[]
storage[]
Dense values + present bitmap
SparseBool true/false/present bitmaps
SparseLookup value -> bitmap
SparseOrdered sorted (node_idx, value)
text dictionaries[]
```
Sparse threshold:
```text
populated_count * 100 < node_count * 15
```
That is, under 15 percent populated uses sparse storage.
## ResolutionIndex
The resolution index maps:
```text
(table_oid, primary_key) -> node_idx
```
Build mode accumulates compact entries. Finalization serializes a sorted array.
Mmap mode performs binary search directly over the persisted bytes.
## Memory Estimate
`Engine::estimated_memory_used_mb()` estimates:
```text
nodes * (active bit + table_oid + average primary key)
+ forward CSR arrays
+ reverse CSR arrays
+ resolution index
+ FilterIndex heap
+ edge overlay buffer
```
`graph.estimate()` and build preflight use a separate conservative estimate
from PostgreSQL row estimates before allocating the engine.
When the engine is loaded from a `.pggraph` file, this estimate is a logical size
estimate, not a per-backend private RSS formula. Mmap-backed forward arrays and
resolution bytes are shared physically by the OS page cache across backends;
reverse CSR, filter index, registry, and overlay structures remain per-backend
heap.
## Mmap Materialization For Sync
Mmap-backed stores are immutable. When sync needs to mutate nodes, the engine
materializes the mmap node store into owned arrays:
Edge mutations do not rewrite CSR. They live in `edge_buffer` overlays until a
maintenance rebuild.
## Unsafe Boundary
Raw mmap pointer metadata is only constructed by validated constructors:
| Type | Validation |
|---|---|
| `MmapNodeArrays::new` | pointer presence, active byte count, `u32`/`u64` alignment |
| `MmapEdgeArrays::new` | pointer presence, optional weights pointer, `u32` alignment |
| `validate_section_layout` | section ordering, bounds, sizes, alignment, CRC, CSR content, PK offsets |
Every unsafe raw slice or pointer dereference has a local `// SAFETY:` comment
explaining the proof at the call site.