# Architecture This page is the contributor-level map of the `graph` crate. It explains the crate graph, data flow, ownership model, lifetimes, and the design choices that shape the implementation. For SQL behavior, start with the [User Guide](/user_guide). This page assumes you are changing Rust code or reviewing how the extension works inside a PostgreSQL backend. ## System Shape ## Runtime Ownership Each PostgreSQL backend owns its own `Engine` in thread-local storage: ```rust thread_local! { static ENGINE: RefCell = RefCell::new(Engine::new()); } ``` That means there is no shared Rust heap between connections. Sharing happens only through PostgreSQL storage, the filesystem, and OS page-cache backed mmap pages. | Object | Owner | Lifetime | |---|---|---| | `Engine` | One PostgreSQL backend process | Backend lifetime or until `graph.reset()` | | `NodeStore` owned mode | `Engine` heap | Mutable build/sync lifetime | | `NodeStore` mmap mode | Raw pointers into `Engine._mmap` | Valid while `_mmap` remains owned by the engine | | Forward `EdgeStore` mmap mode | Raw pointers into `Engine._mmap` | Valid while `_mmap` remains owned by the engine | | `reverse_edge_store` | `Engine` heap | Rebuilt per backend from forward CSR | | `FilterIndex` after load | `Engine` heap | Deserialized per backend from bincode | | `edge_type_registry` after load | `Engine` heap | Deserialized per backend from bincode | | `ResolutionIndex` mmap mode | Byte slice inside `Engine._mmap` | Valid while `_mmap` remains owned by the engine | | `edge_buffer` and `resolution_delta` | `Engine` heap | Backend-local sync overlay state | ## Data Flow: Build - NodeStore rows - ResolutionIndexBuilder entries - FilterIndex values - tenant membership bitmaps - forward CSR EdgeStore - reverse CSR EdgeStore The build path is allowed to allocate and sort. Query paths are not. ## Data Flow: Load `load_graph_file()` validates the file before constructing any mmap-backed store: - mmap-backed NodeStore arrays - mmap-backed forward EdgeStore arrays - mmap-backed ResolutionIndex section - heap FilterIndex from bincode - heap edge_type_registry from bincode - heap reverse_edge_store from forward CSR The forward graph arrays and resolution section are mmap-backed. The reverse CSR and bincode sections are backend-local heap allocations today. ## Data Flow: Query - check graph.enabled - auto-load persisted graph if needed - validate call options - ACL/admin check where required - resolve seed table+PK to node_idx - select forward or reverse CSR - run bounded traversal/path/search algorithm - apply filters, tenants, overlays, pagination Graph algorithms operate on compact node indexes. SQL-facing functions translate between PostgreSQL coordinates and those internal indexes. ## Sync And Maintenance Flow Trigger sync is deliberately explicit. Query functions do not hide sync catch-up work. Node inserts and tombstones can update backend-local state. Edge mutations use overlay buffers until maintenance or vacuum rebuilds the base CSR. ## Design Decisions | Decision | Why it exists | |---|---| | SQL is the public API | Keeps application integration inside PostgreSQL and avoids a new query language. | | Source tables stay authoritative | pgGraph is an acceleration layer, not a second source of truth. | | Backend-local `Engine` | Matches PostgreSQL process isolation and avoids shared mutable Rust state. | | CSR for topology | Compact adjacency slices make traversal cache-friendly and predictable. | | Reverse CSR is materialized | Inbound traversal stays O(degree) instead of scanning all forward edges. | | Read-only mmap for persisted forward arrays | Later backends can start quickly and share immutable derived artifact pages through the OS page cache without replacing PostgreSQL's buffer pool. | | Bincode metadata is heap-loaded | Filter and registry structures are variable-size Rust data that are easier to validate and use as owned values. | | Explicit maintenance | Expensive rebuild work is visible and controllable from SQL. | | Circuit breakers everywhere | The extension runs in PostgreSQL backends and must bound memory and traversal work. | ## Safety Boundaries Unsafe code exists for performance and PostgreSQL integration, not as a general escape hatch. The core unsafe boundary is mmap-backed store construction: | Boundary | Required invariant | |---|---| | `MmapNodeArrays` | Active bytes, OID array, PK offsets, and PK byte ranges are present, aligned, and bounded by the mmap. | | `MmapEdgeArrays` | CSR offsets, targets, type IDs, and optional weights are present, aligned, and bounded by the mmap. | | `Engine._mmap` | The mmap outlives every NodeStore, forward EdgeStore, and ResolutionIndex lookup that borrows from it. | | `raise_graph_error()` | PostgreSQL error FFI is called with stable strings and is treated as non-returning at the SQL boundary. | Detailed rules live in [Safety And Security](./safety-security). Keep rustdoc `# Safety` sections and local `// SAFETY:` comments current when touching any unsafe area. ## Where To Make Changes | Change | Start here | |---|---| | SQL function shape or return columns | `src/sql_facade/*`, then `docs/user_guide/api-reference.mdx` | | Registration validation | `src/catalog/validate.rs` | | Build ingestion | `src/builder.rs` | | Traversal behavior | `src/bfs.rs`, `src/engine.rs`, `src/sql_facade/traversal.rs` | | Shortest path behavior | `src/path_finder.rs`, `src/engine.rs` | | Persistence format | `src/persistence.rs`, `docs/contributor_guide/persistence-format.mdx` | | Mmap-backed stores | `src/node_store.rs`, `src/edge_store.rs`, `src/persistence.rs` | | Sync behavior | `src/sync.rs`, `src/sql_sync.rs`, `src/sql_facade/admin.rs` | | SQLSTATE or error semantics | `src/safety.rs`, `docs/user_guide/troubleshooting.mdx` |