# Build And Persistence `graph.build()` constructs a backend-local active engine from registered tables, edges, and filter columns. When `graph.persist_on_build = true`, the base graph is written atomically under the selected graph's UUID directory and later backends can load it through read-only mmap. When persistence is requested, build, maintenance, and vacuum paths fail closed: an artifact write error or immediate mmap reload error aborts the operation instead of installing an unpersisted in-memory fallback. The `.pggraph` file is derived state. It is not PostgreSQL table storage and it does not replace PostgreSQL's buffer pool, WAL, MVCC, or crash-recovery machinery. If an artifact is missing, incompatible, or corrupt, rebuild it from the source tables. Background workers are used for asynchronous build and maintenance jobs. They rebuild/apply/persist graph state, but there is no always-running worker that updates every backend's already-loaded `Engine` in place. ## Build Commands Synchronous build: ```sql SELECT * FROM graph.build(); SELECT * FROM graph.build(mode := 'csr_readonly'); ``` Compatibility overload: ```sql SELECT * FROM graph.build(concurrently := false); ``` Background build: ```sql SELECT * FROM graph.build(concurrently := true); SELECT * FROM graph.build_status(''); ``` Named graph build: ```sql SELECT * FROM graph.build_graph('customer_360', graph_namespace := 'analytics'); SELECT * FROM graph.build_async_graph('customer_360', graph_namespace := 'analytics'); SELECT * FROM graph.build_status_for_graph('customer_360', graph_namespace := 'analytics'); ``` The background path creates a row in `graph._build_jobs`, stores the selected graph, stores the selected projection mode, and launches a dynamic background worker. If worker launch fails, the job row is marked failed and the SQL call reports the error. If the worker starts and the build later fails, pgGraph records the failed status and error detail from a fresh worker transaction so `graph.build_status('')` remains the operator source of truth. Use `graph.build_status_for_graph(...)` to list recent jobs for one named graph. ## Return Columns `graph.build()` returns: | Column | Type | Meaning | |---|---|---| | `nodes_loaded` | `BIGINT` | Nodes loaded from registered source tables | | `edges_loaded` | `BIGINT` | Directed edges loaded into the forward CSR store | | `build_time_ms` | `DOUBLE PRECISION` | Wall-clock build duration | | `memory_used_mb` | `DOUBLE PRECISION` | Engine memory estimate after build | | `sync_mode` | `TEXT` | Parsed sync mode at build time | | `projection_mode` | `TEXT` | Built projection mode: `csr_readonly` or `mutable_overlay` | `graph.build(concurrently := ...)` returns: | Column | Meaning | |---|---| | `build_id` | Durable job ID, or zero UUID for synchronous compatibility path | | `status` | `queued`, `running`, `completed`, or `failed` | | `nodes_loaded`, `edges_loaded`, `build_time_ms`, `memory_used_mb` | Populated after completion | | `sync_mode` | Sync mode captured for the job | | `projection_mode` | Projection mode captured for the job | Named graph job helpers also return `graph_id` and `graph_name` so operators can attribute queued, running, completed, and failed work to a graph without joining internal catalog tables. ## Build Pipeline ## Build Memory Guard Before loading rows, the builder estimates memory with registered table and edge row estimates from `pg_class.reltuples`. Reused source tables are counted from one cached estimate during the preflight, so a table that contributes both nodes and multiple registered edges does not require repeated catalog estimate reads. If the estimate exceeds `graph.memory_limit_mb`: | `graph.oom_action` | Behavior | |---|---| | `error` | Raise `PG001` and stop before allocating the engine | | `readonly` | Log a warning, build anyway, and mark the engine read-only | Use: ```sql SELECT * FROM graph.estimate(); ``` to see estimated nodes, edges, memory, configured limit, and whether the graph fits the current limit. ## Build Locking Build and vacuum/maintenance rebuild paths coordinate through a build lock. A concurrent build or vacuum reports `PG006`. ```text session A: graph.build() holds build/vacuum lock session B: graph.build() -> PG006 session C: graph.vacuum() -> PG006 ``` ## Persistence Files Default paths: ```text $PGDATA/graph//main.pggraph $PGDATA/graph//main.pggraph.sync ``` `graph.data_dir` changes the top-level subdirectory under `$PGDATA`. The graph directory name is always the catalog `graph_id`, not `graph_name`. ```sql ALTER SYSTEM SET graph.data_dir = 'graph'; ALTER SYSTEM SET graph.persist_on_build = true; ALTER SYSTEM SET graph.auto_load = true; SELECT pg_reload_conf(); ``` ## Atomic Write Contract The current code writes the artifact as: ```text main.pggraph.tmp stream graph sections backpatch header and CRC fsync file rename to main.pggraph main.pggraph.sync.tmp write applied_sync_id fsync file rename to main.pggraph.sync ``` This means a backend should either see the old complete graph file or the new complete graph file, not a partial file under the final name. Background build and maintenance jobs expose persistence phases through `progress_phase` and `progress_message`: `persisting` covers artifact write and fsync, and `validating_persistence` covers immediate mmap reload validation. ## Auto-Load When a query needs the selected graph and the backend-local engine is empty: 1. `ensure_current_graph()` calls `maybe_auto_load()`. 2. If `graph.auto_load = false`, it does nothing. 3. If the selected graph has `cold` residency, it does nothing. 4. If the artifact does not exist, it does nothing. 5. If the artifact exists, `load_graph_file()` validates it and maps immutable sections read-only. 6. The engine is marked built and can answer queries. Each backend has one loaded graph slot. If a different graph is selected, the previous engine is cleared from that backend before another graph is loaded. Use `graph.loaded_graphs()` to inspect the graph currently loaded in the active backend, and use `graph.graph_runtime_status()` to inspect residency, artifact presence, and loaded state for visible graphs. For explicit operator-controlled loading, use: ```sql SELECT * FROM graph.load_graph('customer_360', namespace := 'analytics'); SELECT * FROM graph.unload_graph('customer_360', namespace := 'analytics'); ``` `graph.load_graph()` loads the persisted artifact immediately and does not depend on `graph.auto_load`. It can load a `cold` graph explicitly. Missing, corrupt, incompatible, or quota-blocked artifacts raise an error so they can be rebuilt from the PostgreSQL source tables or handled by an operator. Corrupt or incompatible artifacts are rejected. A corrupt file raises or logs a `PG009`-class loader failure depending on the call site; an incompatible file uses `PG011`. Rebuild from source tables with `SELECT * FROM graph.build();`. ## Status Checks ```sql SELECT * FROM graph.status(); ``` Important build and persistence columns: | Column | Meaning | |---|---| | `node_count` | Node slots in the active backend engine | | `edge_count` | Directed edge rows in the forward CSR store | | `memory_used_mb` | Engine memory estimate | | `last_build` | Last build timestamp for this backend engine | | `last_vacuum` | Last vacuum timestamp for this backend engine | | `schema_status` | `current`, `stale`, or `invalid` | | `needs_rebuild` | Catalog/schema drift requires rebuild | | `invalid_reason` | Human-readable reason for invalid schema state | | `read_only` | Engine has entered read-only mode | ## Rebuild Triggers Rebuild with `graph.build()` when: | Situation | Why | |---|---| | Registered tables or edges changed | Catalog fingerprint no longer matches the built base graph | | Source topology changed without sync maintenance | CSR edges are immutable in the base graph | | Table schema changed | Validation may mark schema state invalid | | `.pggraph` format version changed | Old artifact cannot be loaded | | Corruption is detected | Loader rejects bad magic, CRC, offset, bounds, or section content | ## Reset ```sql SELECT graph.reset(); ``` `graph.reset()` clears the backend-local engine and removes persisted files for the selected graph's UUID directory. It requires graph-admin privileges. `graph.reset()` removes the selected graph's persisted artifact root. The source tables remain unchanged, but graph queries for that graph need a rebuild or a new persisted artifact before they can run again.