# Traversal And Paths

Traversal is split between SQL request normalization, engine dispatch, and
low-level BFS/DFS execution over CSR arrays.

## Hot Loop Data

`bfs::execute()` receives:

| Input | Purpose |
|---|---|
| `NodeStore` | active bit and table lookup for candidate filtering |
| `EdgeStore` | CSR neighbor slices |
| `FilterIndex` | typed filter checks |
| `BfsConfig` | seed, limits, edge filter, tenant bitmaps, sync overlays |

`BfsConfig` owns resolved, hot-loop-friendly values. SQL strings and labels are
resolved before entering the traversal loop.

Traversal result metadata is adaptive. Dense parent/depth arrays are retained
for normal high-coverage traversals, while low-visit-budget traversals on large
graphs use sparse metadata to avoid full-graph scratch allocation.

## BFS Flow

```text
allocate visited bitmap
allocate dense or sparse depth metadata
allocate dense or sparse parent metadata
allocate dense or sparse parent_edge_type metadata
seed frontier

while frontier has nodes:
  pop current
  stop expanding at max_depth
  stream neighbors from CSR plus overlay inserts
  skip overlay-deleted edges
  candidate filters:
    already visited?
    edge type allowed?
    active node?
    tenant membership allowed?
    FilterIndex predicates pass?
  mark visited
  store depth, parent, parent edge type
  enforce max_nodes
  enqueue if depth allows
  enforce max_frontier
```

The BFS neighbor iterator streams base CSR edges and overlay inserts without
materializing all neighbors into a vector. DFS pushes overlay and base
neighbors in reverse stack order without materializing a full per-node neighbor
vector.

## Circuit Breakers

| Breaker | Enforced by | Result |
|---|---|---|
| `max_depth` | Current depth check | Node is included but not expanded at depth limit |
| `max_nodes` | Visited counter | Traversal returns current partial result |
| `max_frontier` | Frontier/stack length | Traversal returns current partial result |

Circuit breakers stop traversal and return partial rows; they are not SQL
errors.

## Candidate Filtering Order

The hot loop filters in this order:

1. Already visited.
2. Edge type not allowed.
3. Node is inactive/tombstoned.
4. Tenant membership excludes node.
5. `FilterIndex` predicates fail.

This order keeps cheap checks before typed filter lookups.

## Path Reconstruction

BFS/DFS stores:

| Vector | Meaning |
|---|---|
| `parent[node_idx]` | Node that discovered this node |
| `parent_edge_type[node_idx]` | Edge label ID from parent to this node |
| `depth[node_idx]` | Hop depth; `-1` for unvisited |

After traversal, `to_traversal_results()` reconstructs each path by walking
parent links backward to the seed, reversing them, and converting node indices
to `(table_oid, primary_key)` coordinates.

## Shortest Path

`Engine::shortest_path()`:

1. Verifies engine built.
2. Resolves source and target coordinates.
3. Calls `path_finder::shortest_path()`.
4. Returns path steps with edge labels.

The path finder can use bidirectional search when safe. When the graph contains
any unidirectional edges, correctness requires a forward search path that does
not assume reverse reachability represents the original directed graph.
Shortest-path parent metadata is sparse: unweighted path search records only
visited parent links, and weighted path search records parent edge metadata only
for relaxed nodes.

## Weighted Shortest Path

Weighted paths use registered edge weights and Dijkstra-style cost expansion.
The path layer accumulates costs as `u64` values with checked addition so valid
large path costs do not collide with the unreachable sentinel. The SQL wrapper
returns one row per path step and exposes `step_cost` and `total_cost` as
`bigint`; costs outside the SQL `bigint` range fail instead of being capped.
Weighted search still uses a dense distance vector for fast relaxation checks,
but parent edge type and weight metadata are stored sparsely and only for nodes
that receive a better distance.
The SQL wrapper returns no rows when:

| Condition | Reason |
|---|---|
| No path exists | Nothing to return |
| No weight data exists | Weighted path cannot be evaluated |

Missing per-edge weights in a weighted store default to cost `1`.

## Source-Table Search

Search is intentionally not a graph-artifact lookup. `sql_search.rs` builds SQL
over registered source tables and verifies matches against source rows. The
Rust recheck prepares query match state once and reuses it across candidate
rows so SQL/Rust predicate parity does not require per-row query normalization.

Search modes:

| Mode | Meaning |
|---|---|
| `contains` | substring match |
| `exact` | whole-value match |
| `prefix` | prefix match |
| `token` | token match |

This keeps search correctness tied to source rows and lets PostgreSQL indexes
accelerate search predicates.

## Hydration

Hydration happens after graph coordinates are known. It reads source rows by
table OID and primary key and returns JSONB. Keep hydration out of internal
traversal loops.

## Aggregation Internals

`sql_aggregation.rs` parses strict traversal JSON. It rejects unexpected keys so
callers cannot silently rely on misspelled or ignored options.

The aggregation path can operate on:

| Scope | Internal behavior |
|---|---|
| returned nodes | Traverse and aggregate reached coordinates |
| chosen parent path | Expand traversal rows to selected parent path |
| all possible paths | Enumerate paths with a hard count cap |

Path enumeration must stay capped. Do not add an unbounded aggregate mode.
All-path aggregation hydrates each unique coordinate once, then groups hydrated
rows by table so repeated path occurrences can use borrowed node-id lookups.
Chosen-parent-path aggregation keeps a borrowed cache over traversal rows and
only clones hydrated JSON for rows it emits.