# Traversal And Paths Traversal is split between SQL request normalization, engine dispatch, and low-level BFS/DFS execution over CSR arrays. ## Hot Loop Data `bfs::execute()` receives: | Input | Purpose | |---|---| | `NodeStore` | active bit and table lookup for candidate filtering | | `EdgeStore` | CSR neighbor slices | | `FilterIndex` | typed filter checks | | `BfsConfig` | seed, limits, edge filter, tenant bitmaps, sync overlays | `BfsConfig` owns resolved, hot-loop-friendly values. SQL strings and labels are resolved before entering the traversal loop. Traversal result metadata is adaptive. Dense parent/depth arrays are retained for normal high-coverage traversals, while low-visit-budget traversals on large graphs use sparse metadata to avoid full-graph scratch allocation. ## BFS Flow ```text allocate visited bitmap allocate dense or sparse depth metadata allocate dense or sparse parent metadata allocate dense or sparse parent_edge_type metadata seed frontier while frontier has nodes: pop current stop expanding at max_depth stream neighbors from CSR plus overlay inserts skip overlay-deleted edges candidate filters: already visited? edge type allowed? active node? tenant membership allowed? FilterIndex predicates pass? mark visited store depth, parent, parent edge type enforce max_nodes enqueue if depth allows enforce max_frontier ``` The BFS neighbor iterator streams base CSR edges and overlay inserts without materializing all neighbors into a vector. DFS pushes overlay and base neighbors in reverse stack order without materializing a full per-node neighbor vector. ## Circuit Breakers | Breaker | Enforced by | Result | |---|---|---| | `max_depth` | Current depth check | Node is included but not expanded at depth limit | | `max_nodes` | Visited counter | Traversal returns current partial result | | `max_frontier` | Frontier/stack length | Traversal returns current partial result | Circuit breakers stop traversal and return partial rows; they are not SQL errors. ## Candidate Filtering Order The hot loop filters in this order: 1. Already visited. 2. Edge type not allowed. 3. Node is inactive/tombstoned. 4. Tenant membership excludes node. 5. `FilterIndex` predicates fail. This order keeps cheap checks before typed filter lookups. ## Path Reconstruction BFS/DFS stores: | Vector | Meaning | |---|---| | `parent[node_idx]` | Node that discovered this node | | `parent_edge_type[node_idx]` | Edge label ID from parent to this node | | `depth[node_idx]` | Hop depth; `-1` for unvisited | After traversal, `to_traversal_results()` reconstructs each path by walking parent links backward to the seed, reversing them, and converting node indices to `(table_oid, primary_key)` coordinates. ## Shortest Path `Engine::shortest_path()`: 1. Verifies engine built. 2. Resolves source and target coordinates. 3. Calls `path_finder::shortest_path()`. 4. Returns path steps with edge labels. The path finder can use bidirectional search when safe. When the graph contains any unidirectional edges, correctness requires a forward search path that does not assume reverse reachability represents the original directed graph. Shortest-path parent metadata is sparse: unweighted path search records only visited parent links, and weighted path search records parent edge metadata only for relaxed nodes. ## Weighted Shortest Path Weighted paths use registered edge weights and Dijkstra-style cost expansion. The path layer accumulates costs as `u64` values with checked addition so valid large path costs do not collide with the unreachable sentinel. The SQL wrapper returns one row per path step and exposes `step_cost` and `total_cost` as `bigint`; costs outside the SQL `bigint` range fail instead of being capped. Weighted search still uses a dense distance vector for fast relaxation checks, but parent edge type and weight metadata are stored sparsely and only for nodes that receive a better distance. The SQL wrapper returns no rows when: | Condition | Reason | |---|---| | No path exists | Nothing to return | | No weight data exists | Weighted path cannot be evaluated | Missing per-edge weights in a weighted store default to cost `1`. ## Source-Table Search Search is intentionally not a graph-artifact lookup. `sql_search.rs` builds SQL over registered source tables and verifies matches against source rows. The Rust recheck prepares query match state once and reuses it across candidate rows so SQL/Rust predicate parity does not require per-row query normalization. Search modes: | Mode | Meaning | |---|---| | `contains` | substring match | | `exact` | whole-value match | | `prefix` | prefix match | | `token` | token match | This keeps search correctness tied to source rows and lets PostgreSQL indexes accelerate search predicates. ## Hydration Hydration happens after graph coordinates are known. It reads source rows by table OID and primary key and returns JSONB. Keep hydration out of internal traversal loops. ## Aggregation Internals `sql_aggregation.rs` parses strict traversal JSON. It rejects unexpected keys so callers cannot silently rely on misspelled or ignored options. The aggregation path can operate on: | Scope | Internal behavior | |---|---| | returned nodes | Traverse and aggregate reached coordinates | | chosen parent path | Expand traversal rows to selected parent path | | all possible paths | Enumerate paths with a hard count cap | Path enumeration must stay capped. Do not add an unbounded aggregate mode. All-path aggregation hydrates each unique coordinate once, then groups hydrated rows by table so repeated path occurrences can use borrowed node-id lookups. Chosen-parent-path aggregation keeps a borrowed cache over traversal rows and only clones hydrated JSON for rows it emits.