# Build Pipeline The builder turns registered PostgreSQL tables into an `Engine`. It is designed to avoid retaining all raw edges in Rust memory by using PostgreSQL temporary spool tables and bounded SPI cursor batches. ## Inputs The build receives: | Input | Rust type | |---|---| | Registered tables | `Vec` | | Registered edges | `Vec` | | Registered filter columns | `Vec` | The SQL facade and catalog modules validate registrations before they are stored. The builder still keeps defensive checks for malformed or drifting inputs. ## Pipeline - open SPI cursor - fetch graph.build_batch_size rows - assign node_idx - append NodeStore row - write node lookup batch - insert resolution entry - collect filter and tenant values - open SPI cursor - fetch endpoint rows - resolve endpoints through node lookup spool - write directed edge rows to edge spool ## Build Scan Mode `graph.build_scan_mode` supports: | Value | Current behavior | |---|---| | `select` | SPI cursor batches; implemented | | `copy` | Reserved; current code returns a clear error | Do not document or implement caller-facing COPY behavior until the code owns a safe, tested server-side COPY reader path. ## Node Loading For each registered table: 1. Resolve table OID. 2. Build a primary-key SQL expression. 3. Include registered filter columns and tenant column in the SELECT list. 4. Open an SPI cursor. 5. Fetch up to `graph.build_batch_size` rows per batch. 6. Append each row to `NodeStore`. 7. Buffer `(table_oid, primary_key, node_idx)` rows into `pg_temp.graph_build_nodes`. 8. Add resolution entry. 9. Collect filter values for later `FilterIndex` initialization. 10. Add tenant membership if the table is tenanted. Composite primary keys use: ```text jsonb_build_array(col1::text, col2::text, ...)::text ``` ## Filter Loading Filter columns are registered after all nodes are loaded because storage choice depends on `node_count` and populated count. Values collected during node loading are then encoded and written to `FilterIndex`. Encoding examples: | Column type | Build expression or conversion | |---|---| | `numeric` | `column::bigint` | | `boolean` | `column::boolean` | | `text` | `column::text`, then intern dictionary token | | `date` | `(column::date - DATE '2000-01-01')::bigint` | | `timestamptz` | `EXTRACT(EPOCH FROM column::timestamptz) * 1000000` | | `uuid` | text to canonical `u128` | ## Edge Loading For each registered edge: 1. Register static label or prepare dynamic `label_column`. 2. Determine endpoint expression style. 3. Read endpoint keys, optional weight, optional dynamic label. 4. Resolve endpoints through `pg_temp.graph_build_nodes`. 5. Add one directed edge, and a reverse directed edge when `bidirectional`. 6. Flush to `pg_temp.graph_build_edges` in bounded batches. Then `load_edge_store_from_spool()` streams: ```sql SELECT source, target, type_id, weight FROM pg_temp.graph_build_edges ORDER BY source, target, type_id ``` into `SortedEdgeStoreBuilder`. ## Deduplication Both `EdgeStore::from_edges()` and sorted builders deduplicate duplicate `(source, target, type_id)` edges. This keeps CSR slices canonical even when source tables contain repeated relationship rows. ## Build Completion At the end of `build_graph()`: | Field | Set to | |---|---| | `engine.edge_store` | Forward CSR from edge spool | | `engine.reverse_edge_store` | Reverse CSR built from forward CSR | | `engine.built` | `true` | | `engine.is_read_only` | Based on OOM policy outcome | | `engine.last_build` | Current PostgreSQL timestamp | Persistence is orchestrated outside `builder.rs` by `sql_build.rs`. ## Failure Modes | Failure | Error | |---|---| | Estimated memory exceeds limit and OOM policy is `error` | `GraphError::Oom` | | COPY scan mode selected | `GraphError::Internal` with reserved-mode message | | SPI cursor/read failure | `GraphError::Internal` | | Edge endpoint cannot be resolved | Edge row is skipped | | Edge label count exceeds limit | `GraphError::EdgeTypeLimit` | | Invalid UUID filter value | `GraphError::InvalidFilter` | Skipping unresolved edge rows is intentional: source relationships that point to unregistered or missing nodes cannot become graph edges.