# Architecture
This page is the contributor-level map of the `graph` crate. It explains the
crate graph, data flow, ownership model, lifetimes, and the design choices that
shape the implementation.
For SQL behavior, start with the [User Guide](/user_guide). This page assumes
you are changing Rust code or reviewing how the extension works inside a
PostgreSQL backend.
## System Shape
## Runtime Ownership
Each PostgreSQL backend owns its own `Engine` in thread-local storage:
```rust
thread_local! {
static ENGINE: RefCell = RefCell::new(Engine::new());
}
```
That means there is no shared Rust heap between connections. Sharing happens
only through PostgreSQL storage, the filesystem, and OS page-cache backed mmap
pages.
Each backend also tracks one loaded graph slot beside the engine. The slot is
tagged with the graph id/name and residency that produced the active engine, so
runtime calls can reject or clear a stale engine before serving a different
selected graph. `graph.select_graph()` changes the session selection,
`graph.load_graph()` loads one persisted artifact into the backend-local slot,
and `graph.loaded_graphs()` reports zero or one row for the current backend.
`graph.graph_runtime_status()` combines the slot with named-graph metadata and
artifact checks for operator-facing residency/status inspection.
Compatibility SQL calls such as `graph.add_table()`, `graph.build()`,
`graph.status()`, `graph.traverse()`, and `graph.reset()` resolve the selected
graph, or the default graph when no graph has been selected. The default graph
behavior remains available while named graph identity is explicit in catalogs,
jobs, artifacts, runtime loading, sync replay, and projection metadata.
The default graph identity is intentionally explicit in the Rust policy layer:
the compatibility graph is named `default`, uses the `public` graph namespace,
and has a reserved canonical UUID. Future catalog rows use PostgreSQL `uuid`
values, while Rust passes graph identity as a typed value instead of deriving it
from paths, backend globals, or session state.
| Object | Owner | Lifetime |
|---|---|---|
| `Engine` | One PostgreSQL backend process | Backend lifetime, graph selection change, explicit unload, or `graph.reset()` |
| Loaded graph slot | One PostgreSQL backend process | Tracks the graph id/name for the active `Engine` |
| `NodeStore` owned mode | `Engine` heap | Mutable build/sync lifetime |
| `NodeStore` mmap mode | Raw pointers into `Engine._mmap` | Valid while `_mmap` remains owned by the engine |
| Forward `EdgeStore` mmap mode | Raw pointers into `Engine._mmap` | Valid while `_mmap` remains owned by the engine |
| `reverse_edge_store` | `Engine` heap | Rebuilt per backend from forward CSR |
| `FilterIndex` after load | `Engine` heap | Deserialized per backend from bincode |
| `edge_type_registry` after load | `Engine` heap | Deserialized per backend from bincode |
| `ResolutionIndex` mmap mode | Byte slice inside `Engine._mmap` | Valid while `_mmap` remains owned by the engine |
| Durable projection manifests, segments, and chunks | Files beside each graph's `main.pggraph` | Shared through the filesystem and retained by graph-scoped active-generation metadata |
| `edge_buffer` and `resolution_delta` | `Engine` heap | Backend-local compatibility sync overlay state |
| `TxGraphDelta` | Current backend transaction | Transaction-local read-your-own-writes state only |
## Multi-Graph Architecture
The system supports multiple isolated named graphs within the same database. This architecture is implemented across all layers:
| Area | Current behavior |
|---|---|
| Registration catalogs | `graph._registered_tables`, `graph._registered_edges`, and `graph._registered_filter_columns` are keyed by `graph_id`; compatibility SQL resolves the selected or default graph before reading or writing them. |
| Runtime | `ENGINE` stores one backend-local graph engine tagged by a loaded graph slot. `ensure_current_graph()` validates that the loaded slot matches the selected graph before query execution. |
| Build and hosted jobs | `graph._build_jobs`, `graph._maintenance_jobs`, `graph._jobs`, `graph._job_runs`, and `graph._sync_policies` are keyed by `graph_id`; workers and hosted schedulers restore database, user, and graph context before running. |
| Persistence | The graph artifact is stored as `$PGDATA///main.pggraph`. |
| Sync | `graph._sync_log` and `graph._sync_buffer` are global source-table streams; replay, status, and query freshness filter `_sync_log` through the selected graph's registered table OIDs. |
| Projection generations | `graph._projection_generations` is keyed by `graph_id` so active-generation heartbeats and GC protection are graph-local. |
| Discovery | `graph.auto_discover()` and `graph.auto_discover_tables()` target a named graph or the selected/default graph. |
| Reset | `graph.reset()` clears the backend-local engine and removes files for the selected graph's artifact root. |
Named-graph changes moved each of these components from implicit global state to explicit graph-scoped inputs while preserving the default-graph SQL workflow for backward compatibility.
## Data Flow: Build
- NodeStore rows
- ResolutionIndexBuilder entries
- FilterIndex values
- tenant membership bitmaps
- forward CSR EdgeStore
- reverse CSR EdgeStore
The build path is allowed to allocate and sort. Query paths are not.
## Data Flow: Load
`load_graph_file()` validates the file before constructing any mmap-backed
store:
- mmap-backed NodeStore arrays
- mmap-backed forward EdgeStore arrays
- mmap-backed ResolutionIndex section
- heap FilterIndex from bincode
- heap edge_type_registry from bincode
- heap reverse_edge_store from forward CSR
The forward graph arrays and resolution section are mmap-backed. The reverse
CSR and bincode sections are backend-local heap allocations today.
## Data Flow: Query
- check graph.enabled
- auto-load persisted graph if needed
- validate call options
- ACL/admin check where required
- resolve seed table+PK to node_idx
- select forward or reverse CSR
- run bounded traversal/path/search algorithm
- apply filters, tenants, overlays, pagination
Graph algorithms operate on compact node indexes. SQL-facing functions translate
between PostgreSQL coordinates and those internal indexes.
## Sync And Maintenance Flow
Trigger sync records source-table changes in `graph._sync_log`. Topology query
functions apply pending trigger-sync rows for the selected graph by default up
to a captured high-water mark for that graph's registered table OIDs; operators
can set `graph.query_freshness = 'off'` for compatibility/manual catch-up or
`error_on_pending` to fail instead of reading stale topology.
Persisted `mutable_overlay` graphs publish committed edge changes as durable L0
projection segments and reload the latest projection manifest, so other
backends can observe committed topology changes without a full rebuild.
Node inserts and tombstones can update backend-local state. For persisted
`mutable_overlay` graphs, committed edge mutations publish durable projection
segments that layered reads merge with the base CSR. Segment metadata records
the dirty source-node range when sync replay knows it, and future compaction can
replace those ranges with copy-on-write base chunks without mutating the mmap'd
CSR file. Non-persisted or compatibility paths may still use backend-local
overlays until maintenance or vacuum rebuilds the base CSR. Transaction-local
deltas remain backend-local and are applied last for read-your-own-writes
behavior.
## Design Decisions
| Decision | Why it exists |
|---|---|
| SQL is the public API | Keeps application integration inside PostgreSQL and avoids a new query language. |
| Source tables stay authoritative | pgGraph is an acceleration layer, not a second source of truth. |
| Backend-local `Engine` | Matches PostgreSQL process isolation and avoids shared mutable Rust state. |
| CSR for topology | Compact adjacency slices make traversal cache-friendly and predictable. |
| Reverse CSR is materialized | Inbound traversal stays O(degree) instead of scanning all forward edges. |
| Read-only mmap for persisted forward arrays | Later backends can start quickly and share immutable derived artifact pages through the OS page cache without replacing PostgreSQL's buffer pool. |
| Bincode metadata is heap-loaded | Filter and registry structures are variable-size Rust data that are easier to validate and use as owned values. |
| Explicit maintenance | Expensive rebuild work is visible and controllable from SQL. |
| Durable projection generations | Committed mutable-overlay topology changes become cross-backend-visible without turning pgGraph into a second source of truth. |
| Circuit breakers everywhere | The extension runs in PostgreSQL backends and must bound memory and traversal work. |
## Safety Boundaries
Unsafe code exists for performance and PostgreSQL integration, not as a general
escape hatch. The core unsafe boundary is mmap-backed store construction:
| Boundary | Required invariant |
|---|---|
| `MmapNodeArrays` | Active bytes, OID array, PK offsets, and PK byte ranges are present, aligned, and bounded by the mmap. |
| `MmapEdgeArrays` | CSR offsets, targets, type IDs, and optional weights are present, aligned, and bounded by the mmap. |
| `Engine._mmap` | The mmap outlives every NodeStore, forward EdgeStore, and ResolutionIndex lookup that borrows from it. |
| `raise_graph_error()` | PostgreSQL error FFI is called with stable strings and is treated as non-returning at the SQL boundary. |
Detailed rules live in [Safety And Security](./safety-security). Keep rustdoc
`# Safety` sections and local `// SAFETY:` comments current when touching any
unsafe area.
## Where To Make Changes
| Change | Start here |
|---|---|
| SQL function shape or return columns | `src/sql_facade/*`, then `docs/user_guide/api-reference.mdx` |
| Registration validation | `src/catalog/validate.rs` |
| Build ingestion | `src/builder.rs` |
| Traversal behavior | `src/bfs.rs`, `src/engine.rs`, `src/sql_facade/traversal.rs` |
| Shortest path behavior | `src/path_finder.rs`, `src/engine.rs` |
| Persistence format | `src/persistence.rs`, `docs/contributor_guide/persistence-format.mdx` |
| Mmap-backed stores | `src/node_store.rs`, `src/edge_store.rs`, `src/persistence.rs` |
| Sync behavior | `src/sync.rs`, `src/sql_sync.rs`, `src/sql_facade/admin.rs` |
| SQLSTATE or error semantics | `src/safety.rs`, `docs/user_guide/troubleshooting.mdx` |