# Build And Persistence
`graph.build()` constructs a backend-local active engine from registered tables,
edges, and filter columns. When `graph.persist_on_build = true`, the base graph
is written atomically under the selected graph's UUID directory and later
backends can load it through read-only mmap.
When persistence is requested, build, maintenance, and vacuum paths fail closed:
an artifact write error or immediate mmap reload error aborts the operation
instead of installing an unpersisted in-memory fallback.
The `.pggraph` file is derived state. It is not PostgreSQL table storage and it
does not replace PostgreSQL's buffer pool, WAL, MVCC, or crash-recovery
machinery. If an artifact is missing, incompatible, or corrupt, rebuild it from
the source tables.
Background workers are used for asynchronous build and maintenance jobs. They
rebuild/apply/persist graph state, but there is no always-running worker that
updates every backend's already-loaded `Engine` in place.
## Build Commands
Synchronous build:
```sql
SELECT * FROM graph.build();
SELECT * FROM graph.build(mode := 'csr_readonly');
```
Compatibility overload:
```sql
SELECT * FROM graph.build(concurrently := false);
```
Background build:
```sql
SELECT * FROM graph.build(concurrently := true);
SELECT * FROM graph.build_status('');
```
Named graph build:
```sql
SELECT * FROM graph.build_graph('customer_360', graph_namespace := 'analytics');
SELECT * FROM graph.build_async_graph('customer_360', graph_namespace := 'analytics');
SELECT * FROM graph.build_status_for_graph('customer_360', graph_namespace := 'analytics');
```
The background path creates a row in `graph._build_jobs`, stores the selected
graph, stores the selected projection mode, and launches a dynamic background
worker. If worker launch fails, the job row is marked failed and the SQL call
reports the error.
If the worker starts and the build later fails, pgGraph records the failed
status and error detail from a fresh worker transaction so
`graph.build_status('')` remains the operator source of truth.
Use `graph.build_status_for_graph(...)` to list recent jobs for one named graph.
## Return Columns
`graph.build()` returns:
| Column | Type | Meaning |
|---|---|---|
| `nodes_loaded` | `BIGINT` | Nodes loaded from registered source tables |
| `edges_loaded` | `BIGINT` | Directed edges loaded into the forward CSR store |
| `build_time_ms` | `DOUBLE PRECISION` | Wall-clock build duration |
| `memory_used_mb` | `DOUBLE PRECISION` | Engine memory estimate after build |
| `sync_mode` | `TEXT` | Parsed sync mode at build time |
| `projection_mode` | `TEXT` | Built projection mode: `csr_readonly` or `mutable_overlay` |
`graph.build(concurrently := ...)` returns:
| Column | Meaning |
|---|---|
| `build_id` | Durable job ID, or zero UUID for synchronous compatibility path |
| `status` | `queued`, `running`, `completed`, or `failed` |
| `nodes_loaded`, `edges_loaded`, `build_time_ms`, `memory_used_mb` | Populated after completion |
| `sync_mode` | Sync mode captured for the job |
| `projection_mode` | Projection mode captured for the job |
Named graph job helpers also return `graph_id` and `graph_name` so operators
can attribute queued, running, completed, and failed work to a graph without
joining internal catalog tables.
## Build Pipeline
## Build Memory Guard
Before loading rows, the builder estimates memory with registered table and
edge row estimates from `pg_class.reltuples`. Reused source tables are counted
from one cached estimate during the preflight, so a table that contributes both
nodes and multiple registered edges does not require repeated catalog estimate
reads. If the estimate exceeds `graph.memory_limit_mb`:
| `graph.oom_action` | Behavior |
|---|---|
| `error` | Raise `PG001` and stop before allocating the engine |
| `readonly` | Log a warning, build anyway, and mark the engine read-only |
Use:
```sql
SELECT * FROM graph.estimate();
```
to see estimated nodes, edges, memory, configured limit, and whether the graph
fits the current limit.
## Build Locking
Build and vacuum/maintenance rebuild paths coordinate through a build lock. A
concurrent build or vacuum reports `PG006`.
```text
session A: graph.build() holds build/vacuum lock
session B: graph.build() -> PG006
session C: graph.vacuum() -> PG006
```
## Persistence Files
Default paths:
```text
$PGDATA/graph//main.pggraph
$PGDATA/graph//main.pggraph.sync
```
`graph.data_dir` changes the top-level subdirectory under `$PGDATA`. The graph
directory name is always the catalog `graph_id`, not `graph_name`.
```sql
ALTER SYSTEM SET graph.data_dir = 'graph';
ALTER SYSTEM SET graph.persist_on_build = true;
ALTER SYSTEM SET graph.auto_load = true;
SELECT pg_reload_conf();
```
## Atomic Write Contract
The current code writes the artifact as:
```text
main.pggraph.tmp
stream graph sections
backpatch header and CRC
fsync file
rename to main.pggraph
main.pggraph.sync.tmp
write applied_sync_id
fsync file
rename to main.pggraph.sync
```
This means a backend should either see the old complete graph file or the new
complete graph file, not a partial file under the final name.
Background build and maintenance jobs expose persistence phases through
`progress_phase` and `progress_message`: `persisting` covers artifact write and
fsync, and `validating_persistence` covers immediate mmap reload validation.
## Auto-Load
When a query needs the selected graph and the backend-local engine is empty:
1. `ensure_current_graph()` calls `maybe_auto_load()`.
2. If `graph.auto_load = false`, it does nothing.
3. If the selected graph has `cold` residency, it does nothing.
4. If the artifact does not exist, it does nothing.
5. If the artifact exists, `load_graph_file()` validates it and maps immutable
sections read-only.
6. The engine is marked built and can answer queries.
Each backend has one loaded graph slot. If a different graph is selected, the
previous engine is cleared from that backend before another graph is loaded.
Use `graph.loaded_graphs()` to inspect the graph currently loaded in the active
backend, and use `graph.graph_runtime_status()` to inspect residency, artifact
presence, and loaded state for visible graphs.
For explicit operator-controlled loading, use:
```sql
SELECT * FROM graph.load_graph('customer_360', namespace := 'analytics');
SELECT * FROM graph.unload_graph('customer_360', namespace := 'analytics');
```
`graph.load_graph()` loads the persisted artifact immediately and does not
depend on `graph.auto_load`. It can load a `cold` graph explicitly. Missing,
corrupt, incompatible, or quota-blocked artifacts raise an error so they can be
rebuilt from the PostgreSQL source tables or handled by an operator.
Corrupt or incompatible artifacts are rejected. A corrupt file raises or logs a
`PG009`-class loader failure depending on the call site; an incompatible file
uses `PG011`. Rebuild from source tables with `SELECT * FROM graph.build();`.
## Status Checks
```sql
SELECT * FROM graph.status();
```
Important build and persistence columns:
| Column | Meaning |
|---|---|
| `node_count` | Node slots in the active backend engine |
| `edge_count` | Directed edge rows in the forward CSR store |
| `memory_used_mb` | Engine memory estimate |
| `last_build` | Last build timestamp for this backend engine |
| `last_vacuum` | Last vacuum timestamp for this backend engine |
| `schema_status` | `current`, `stale`, or `invalid` |
| `needs_rebuild` | Catalog/schema drift requires rebuild |
| `invalid_reason` | Human-readable reason for invalid schema state |
| `read_only` | Engine has entered read-only mode |
## Rebuild Triggers
Rebuild with `graph.build()` when:
| Situation | Why |
|---|---|
| Registered tables or edges changed | Catalog fingerprint no longer matches the built base graph |
| Source topology changed without sync maintenance | CSR edges are immutable in the base graph |
| Table schema changed | Validation may mark schema state invalid |
| `.pggraph` format version changed | Old artifact cannot be loaded |
| Corruption is detected | Loader rejects bad magic, CRC, offset, bounds, or section content |
## Reset
```sql
SELECT graph.reset();
```
`graph.reset()` clears the backend-local engine and removes persisted files for
the selected graph's UUID directory. It requires graph-admin privileges.
`graph.reset()` removes the selected graph's persisted artifact root. The source
tables remain unchanged, but graph queries for that graph need a rebuild or a
new persisted artifact before they can run again.