# Build And Persistence `graph.build()` constructs a backend-local active engine from registered tables, edges, and filter columns. When `graph.persist_on_build = true`, the base graph is written atomically to a `.pggraph` file and later backends can load it through read-only mmap. When persistence is requested, build, maintenance, and vacuum paths fail closed: an artifact write error or immediate mmap reload error aborts the operation instead of installing an unpersisted in-memory fallback. The `.pggraph` file is derived state. It is not PostgreSQL table storage and it does not replace PostgreSQL's buffer pool, WAL, MVCC, or crash-recovery machinery. If an artifact is missing, incompatible, or corrupt, rebuild it from the source tables. Background workers are used for asynchronous build and maintenance jobs. They rebuild/apply/persist graph state, but there is no always-running worker that updates every backend's already-loaded `Engine` in place. ## Build Commands Synchronous build: ```sql SELECT * FROM graph.build(); ``` Compatibility overload: ```sql SELECT * FROM graph.build(concurrently := false); ``` Background build: ```sql SELECT * FROM graph.build(concurrently := true); SELECT * FROM graph.build_status(''); ``` The background path creates a row in `graph._build_jobs` and launches a dynamic background worker. If worker launch fails, the job row is marked failed and the SQL call reports the error. If the worker starts and the build later fails, pgGraph records the failed status and error detail from a fresh worker transaction so `graph.build_status('')` remains the operator source of truth. ## Return Columns `graph.build()` returns: | Column | Type | Meaning | |---|---|---| | `nodes_loaded` | `BIGINT` | Nodes loaded from registered source tables | | `edges_loaded` | `BIGINT` | Directed edges loaded into the forward CSR store | | `build_time_ms` | `DOUBLE PRECISION` | Wall-clock build duration | | `memory_used_mb` | `DOUBLE PRECISION` | Engine memory estimate after build | | `sync_mode` | `TEXT` | Parsed sync mode at build time | `graph.build(concurrently := ...)` returns: | Column | Meaning | |---|---| | `build_id` | Durable job ID, or zero UUID for synchronous compatibility path | | `status` | `queued`, `running`, `completed`, or `failed` | | `nodes_loaded`, `edges_loaded`, `build_time_ms`, `memory_used_mb` | Populated after completion | | `sync_mode` | Sync mode captured for the job | | `progress_phase` | Coarse operator phase such as `queued`, `building`, `persisting`, `validating_persistence`, `completed`, or `failed` | | `progress_message` | Short operator-readable progress or failure detail | ## Build Pipeline ## Build Memory Guard Before loading rows, the builder estimates memory with registered table and edge row estimates from `pg_class.reltuples`. Reused source tables are counted from one cached estimate during the preflight, so a table that contributes both nodes and multiple registered edges does not require repeated catalog estimate reads. If the estimate exceeds `graph.memory_limit_mb`: | `graph.oom_action` | Behavior | |---|---| | `error` | Raise `PG001` and stop before allocating the engine | | `readonly` | Log a warning, build anyway, and mark the engine read-only | Use: ```sql SELECT * FROM graph.estimate(); ``` to see estimated nodes, edges, memory, configured limit, and whether the graph fits the current limit. ## Build Locking Build and vacuum/maintenance rebuild paths coordinate through a build lock. A concurrent build or vacuum reports `PG006`. ```text session A: graph.build() holds build/vacuum lock session B: graph.build() -> PG006 session C: graph.vacuum() -> PG006 ``` ## Persistence Files Default paths: ```text $PGDATA/graph/main.pggraph $PGDATA/graph/main.pggraph.sync ``` `graph.data_dir` changes the subdirectory under `$PGDATA`. ```sql ALTER SYSTEM SET graph.data_dir = 'graph'; ALTER SYSTEM SET graph.persist_on_build = true; ALTER SYSTEM SET graph.auto_load = true; SELECT pg_reload_conf(); ``` ## Atomic Write Contract The current code writes the artifact as: ```text main.pggraph.tmp stream graph sections backpatch header and CRC fsync file rename to main.pggraph main.pggraph.sync.tmp write applied_sync_id fsync file rename to main.pggraph.sync ``` This means a backend should either see the old complete graph file or the new complete graph file, not a partial file under the final name. Background build and maintenance jobs expose persistence phases through `progress_phase` and `progress_message`: `persisting` covers artifact write and fsync, and `validating_persistence` covers immediate mmap reload validation. ## Auto-Load When a query needs a graph and the backend-local engine is empty: 1. `ensure_current_graph()` calls `maybe_auto_load()`. 2. If `graph.auto_load = false`, it does nothing. 3. If the artifact does not exist, it does nothing. 4. If the artifact exists, `load_graph_file()` validates it and maps immutable sections read-only. 5. The engine is marked built and can answer queries. Corrupt or incompatible artifacts are rejected. A corrupt file raises or logs a `PG009`-class loader failure depending on the call site; an incompatible file uses `PG011`. Rebuild from source tables with `SELECT * FROM graph.build();`. ## Status Checks ```sql SELECT * FROM graph.status(); ``` Important build and persistence columns: | Column | Meaning | |---|---| | `node_count` | Node slots in the active backend engine | | `edge_count` | Directed edge rows in the forward CSR store | | `memory_used_mb` | Engine memory estimate | | `last_build` | Last build timestamp for this backend engine | | `last_vacuum` | Last vacuum timestamp for this backend engine | | `schema_status` | `current`, `stale`, or `invalid` | | `needs_rebuild` | Catalog/schema drift requires rebuild | | `invalid_reason` | Human-readable reason for invalid schema state | | `read_only` | Engine has entered read-only mode | ## Rebuild Triggers Rebuild with `graph.build()` when: | Situation | Why | |---|---| | Registered tables or edges changed | Catalog fingerprint no longer matches the built base graph | | Source topology changed without sync maintenance | CSR edges are immutable in the base graph | | Table schema changed | Validation may mark schema state invalid | | `.pggraph` format version changed | Old artifact cannot be loaded | | Corruption is detected | Loader rejects bad magic, CRC, offset, bounds, or section content | ## Reset ```sql SELECT graph.reset(); ``` `graph.reset()` clears the backend-local engine and removes persisted `main.pggraph` and `main.pggraph.sync` files. It requires graph-admin privileges. `graph.reset()` removes the persisted artifact. The source tables remain unchanged, but graph queries need a rebuild or a new persisted artifact before they can run again.