# Build And Persistence
`graph.build()` constructs a backend-local active engine from registered tables,
edges, and filter columns. When `graph.persist_on_build = true`, the base graph
is written atomically to a `.pggraph` file and later backends can load it through
read-only mmap.
When persistence is requested, build, maintenance, and vacuum paths fail closed:
an artifact write error or immediate mmap reload error aborts the operation
instead of installing an unpersisted in-memory fallback.
The `.pggraph` file is derived state. It is not PostgreSQL table storage and it
does not replace PostgreSQL's buffer pool, WAL, MVCC, or crash-recovery
machinery. If an artifact is missing, incompatible, or corrupt, rebuild it from
the source tables.
Background workers are used for asynchronous build and maintenance jobs. They
rebuild/apply/persist graph state, but there is no always-running worker that
updates every backend's already-loaded `Engine` in place.
## Build Commands
Synchronous build:
```sql
SELECT * FROM graph.build();
```
Compatibility overload:
```sql
SELECT * FROM graph.build(concurrently := false);
```
Background build:
```sql
SELECT * FROM graph.build(concurrently := true);
SELECT * FROM graph.build_status('');
```
The background path creates a row in `graph._build_jobs` and launches a dynamic
background worker. If worker launch fails, the job row is marked failed and the
SQL call reports the error.
If the worker starts and the build later fails, pgGraph records the failed
status and error detail from a fresh worker transaction so
`graph.build_status('')` remains the operator source of truth.
## Return Columns
`graph.build()` returns:
| Column | Type | Meaning |
|---|---|---|
| `nodes_loaded` | `BIGINT` | Nodes loaded from registered source tables |
| `edges_loaded` | `BIGINT` | Directed edges loaded into the forward CSR store |
| `build_time_ms` | `DOUBLE PRECISION` | Wall-clock build duration |
| `memory_used_mb` | `DOUBLE PRECISION` | Engine memory estimate after build |
| `sync_mode` | `TEXT` | Parsed sync mode at build time |
`graph.build(concurrently := ...)` returns:
| Column | Meaning |
|---|---|
| `build_id` | Durable job ID, or zero UUID for synchronous compatibility path |
| `status` | `queued`, `running`, `completed`, or `failed` |
| `nodes_loaded`, `edges_loaded`, `build_time_ms`, `memory_used_mb` | Populated after completion |
| `sync_mode` | Sync mode captured for the job |
| `progress_phase` | Coarse operator phase such as `queued`, `building`, `persisting`, `validating_persistence`, `completed`, or `failed` |
| `progress_message` | Short operator-readable progress or failure detail |
## Build Pipeline
## Build Memory Guard
Before loading rows, the builder estimates memory with registered table and
edge row estimates from `pg_class.reltuples`. Reused source tables are counted
from one cached estimate during the preflight, so a table that contributes both
nodes and multiple registered edges does not require repeated catalog estimate
reads. If the estimate exceeds `graph.memory_limit_mb`:
| `graph.oom_action` | Behavior |
|---|---|
| `error` | Raise `PG001` and stop before allocating the engine |
| `readonly` | Log a warning, build anyway, and mark the engine read-only |
Use:
```sql
SELECT * FROM graph.estimate();
```
to see estimated nodes, edges, memory, configured limit, and whether the graph
fits the current limit.
## Build Locking
Build and vacuum/maintenance rebuild paths coordinate through a build lock. A
concurrent build or vacuum reports `PG006`.
```text
session A: graph.build() holds build/vacuum lock
session B: graph.build() -> PG006
session C: graph.vacuum() -> PG006
```
## Persistence Files
Default paths:
```text
$PGDATA/graph/main.pggraph
$PGDATA/graph/main.pggraph.sync
```
`graph.data_dir` changes the subdirectory under `$PGDATA`.
```sql
ALTER SYSTEM SET graph.data_dir = 'graph';
ALTER SYSTEM SET graph.persist_on_build = true;
ALTER SYSTEM SET graph.auto_load = true;
SELECT pg_reload_conf();
```
## Atomic Write Contract
The current code writes the artifact as:
```text
main.pggraph.tmp
stream graph sections
backpatch header and CRC
fsync file
rename to main.pggraph
main.pggraph.sync.tmp
write applied_sync_id
fsync file
rename to main.pggraph.sync
```
This means a backend should either see the old complete graph file or the new
complete graph file, not a partial file under the final name.
Background build and maintenance jobs expose persistence phases through
`progress_phase` and `progress_message`: `persisting` covers artifact write and
fsync, and `validating_persistence` covers immediate mmap reload validation.
## Auto-Load
When a query needs a graph and the backend-local engine is empty:
1. `ensure_current_graph()` calls `maybe_auto_load()`.
2. If `graph.auto_load = false`, it does nothing.
3. If the artifact does not exist, it does nothing.
4. If the artifact exists, `load_graph_file()` validates it and maps immutable
sections read-only.
5. The engine is marked built and can answer queries.
Corrupt or incompatible artifacts are rejected. A corrupt file raises or logs a
`PG009`-class loader failure depending on the call site; an incompatible file
uses `PG011`. Rebuild from source tables with `SELECT * FROM graph.build();`.
## Status Checks
```sql
SELECT * FROM graph.status();
```
Important build and persistence columns:
| Column | Meaning |
|---|---|
| `node_count` | Node slots in the active backend engine |
| `edge_count` | Directed edge rows in the forward CSR store |
| `memory_used_mb` | Engine memory estimate |
| `last_build` | Last build timestamp for this backend engine |
| `last_vacuum` | Last vacuum timestamp for this backend engine |
| `schema_status` | `current`, `stale`, or `invalid` |
| `needs_rebuild` | Catalog/schema drift requires rebuild |
| `invalid_reason` | Human-readable reason for invalid schema state |
| `read_only` | Engine has entered read-only mode |
## Rebuild Triggers
Rebuild with `graph.build()` when:
| Situation | Why |
|---|---|
| Registered tables or edges changed | Catalog fingerprint no longer matches the built base graph |
| Source topology changed without sync maintenance | CSR edges are immutable in the base graph |
| Table schema changed | Validation may mark schema state invalid |
| `.pggraph` format version changed | Old artifact cannot be loaded |
| Corruption is detected | Loader rejects bad magic, CRC, offset, bounds, or section content |
## Reset
```sql
SELECT graph.reset();
```
`graph.reset()` clears the backend-local engine and removes persisted
`main.pggraph` and `main.pggraph.sync` files. It requires graph-admin privileges.
`graph.reset()` removes the persisted artifact. The source tables remain
unchanged, but graph queries need a rebuild or a new persisted artifact before
they can run again.