# Querying Graph queries are normal SQL calls. The API has primitive functions for exact control, workflow wrappers for application-friendly search, expansion, paths, connections, and summaries, and a GQL-compatible subset for graph pattern queries, an openCypher compatibility surface for the same supported graph patterns, and narrow mapped writes over registered tables and edges. ## Search `graph.search()` searches registered source-table columns. It does not search the persisted graph artifact; it builds SQL predicates against source rows and returns verified graph coordinates. ```sql SELECT * FROM graph.search( property_key := 'name', property_value := 'alice', table_filter := 'public.users'::regclass, mode := 'contains', case_sensitive := false, max_rows := 100, row_offset := 0, tenant := NULL, hydrate := true ); ``` Return columns: | Column | Meaning | |---|---| | `node_table` | Source table OID | | `node_id` | Source primary key as text | | `match_type` | `contains`, `exact`, `prefix`, or `token` | | `score` | Current score value, currently `1.0` for verified matches | | `verified` | Whether the source row verified against search criteria | | `node` | Hydrated source row as JSONB when `hydrate = true` | | `node_table_name` | Regclass text name | `graph.search_nodes()` returns the same coordinate data without `node`. Search results are verified against source rows, deduplicated by `(node_table, node_id)`, sorted by table OID then node ID, and then paginated. Each source-table scan is bounded to `row_offset + max_rows` candidates before the final global merge. ## GQL `graph.gql()` executes the documented GQL-compatible subset and returns one JSONB object per row. Node labels resolve to registered source-table names, and relationship types resolve to registered edge labels. ```sql SELECT row FROM graph.gql( 'MATCH (o:orders)-[:placed_by]->(c:customers) WHERE o.status = $status RETURN o.id AS order_id, o.status AS status, c.name AS customer ORDER BY customer LIMIT 20', params := '{"status":"paid"}'::jsonb, hydrate := true ); ``` Supported read coverage: | Area | Supported behavior | |---|---| | Pattern | One node-only `MATCH` pattern, one graph pattern with two node variables and one relationship pattern, or top-level single-relationship `OPTIONAL MATCH` | | Labels and types | Node labels map to registered table names; relationship types map to registered edge labels | | Direction | Outbound, inbound, and undirected relationship patterns | | Variable length | Bounded variable-length relationships with an explicit maximum and minimum of at least one hop | | Predicates | Property comparisons, null checks, membership, boolean combinations, JSONB `$parameters`, and dotted paths into registered JSONB property columns | | Return values | Node values, scalar properties, JSONB list/map properties, coordinate-only single-hop relationship identities, bounded relationship path values, aggregate values, and JSON `null` for unmatched optional targets or relationships | | Projection stages | `WITH` can project node variables and scalar properties into the downstream `RETURN` scope, including aliases, shadowing, and `WITH DISTINCT` row-stream deduplication | | Deduplication | `RETURN DISTINCT`, `WITH DISTINCT`, and aggregate `DISTINCT` are supported with a bounded in-memory key set | | Aggregates | `count`, `sum`, `avg`, `min`, `max`, and `collect` in `RETURN`; non-aggregate return items are implicit grouping keys | | Path functions | `nodes(path)`, `relationships(path)`, and `length(path)` over named relationship patterns | | Ordering and pagination | `ORDER BY`, `SKIP`, and `LIMIT`, with the hard GQL row cap still enforced; `RETURN DISTINCT` must order by returned scalar expressions | ### GQL JSONB result format `graph.gql()` and `graph.cypher()` return graph values as ordinary PostgreSQL `jsonb`, not as a custom graph scalar type. Each SQL result row has one column named `row`. The value is a JSON object whose keys are the `RETURN` aliases, or the return expression text when no alias is provided: ```json { "person": { "_id": { "table": "users", "id": "u1" }, "_labels": ["users"], "id": "u1", "name": "Alice" }, "company_name": "Acme" } ``` When a query returns a node value, pgGraph serializes it as a JSON object with stable reserved keys: ```json { "_id": { "table": "users", "id": "u1" }, "_labels": ["users"], "id": "u1", "name": "Alice" } ``` `_id.table` is the registered source-table label and `_id.id` is the source primary-key value serialized as text. `_labels` currently contains the single registered table label. With `hydrate := true`, the readable source row is included as JSON alongside the reserved keys. If a required graph coordinate no longer has a readable backing source row, the query raises a GQL execution error instead of fabricating an empty node object. With `hydrate := false`, the node object contains only `_id` and `_labels`. Relationship values are coordinate identity objects in this release: ```json { "_type": "works_at", "_start": { "table": "users", "id": "u1" }, "_end": { "table": "companies", "id": "c1" } } ``` `_type` is the registered edge label. `_start` and `_end` are endpoint graph coordinates. Relationship source-row hydration, relationship properties, and a separate relationship row identity are not part of the current JSONB contract. Path values are JSON objects with a `_path` member: ```json { "_path": { "nodes": [ { "_id": { "table": "users", "id": "u1" }, "_labels": ["users"] }, { "_id": { "table": "companies", "id": "c1" }, "_labels": ["companies"] } ], "relationships": [ { "_type": "works_at", "_start": { "table": "users", "id": "u1" }, "_end": { "table": "companies", "id": "c1" } } ] } } ``` The `nodes(path)` function returns the `_path.nodes` array directly, `relationships(path)` returns the `_path.relationships` array directly, and `length(path)` returns the relationship count as a JSON number. Scalar properties are returned as their JSON representation: strings, numbers, booleans, arrays, objects, or JSON `null`. Unmatched `OPTIONAL MATCH` targets and relationships also project as JSON `null`. A missing key inside a registered JSONB property path projects as JSON `null`, but it does not satisfy `IS NULL`; explicit JSON `null` does. For SDKs and application code, graph values can be identified by their reserved-key shape: objects with `_path` are paths, objects with `_type`, `_start`, and `_end` are relationships, and objects with `_id` and `_labels` are nodes. Source columns that would collide with reserved graph keys are rejected during binding. By default, GQL returns hydrated node objects when a query returns a node value such as `RETURN o` or `RETURN c`; PostgreSQL ACL and RLS still decide whether the source row can be read. To return only identifiers, project the identifier property directly, for example `RETURN o.id AS order_id`, instead of returning the whole node. The optional `hydrate := false` SQL argument is a pgGraph escape hatch for coordinate-only node objects when callers intentionally want the compact `{table, id}` shape. Required hydration is fail-closed: a stale graph coordinate whose source row cannot be hydrated raises `PG017`. Other read semantics: - Relationship values are coordinate identity objects in this release. - `OPTIONAL MATCH` null-extends unmatched source rows for single-relationship patterns. - Aggregate functions ignore JSON `null` inputs for `count(expr)`, `sum`, `avg`, `min`, `max`, and `collect`; `count(*)` counts every input row. - `collect` preserves non-null values in row order. - Aggregate-only empty inputs return one row with `0`, `null`, or `[]` according to the aggregate. - `DISTINCT` query forms abort if their unique-key working set exceeds the GQL result cap. - Bounded relationship variables return path objects with `_path.nodes` and `_path.relationships`. - `nodes`, `relationships`, and `length` expose path components directly. Registered JSONB source columns can be addressed with dotted property paths such as `u.profile.plan`; returned JSONB arrays and objects keep their JSON shape. A missing key inside a JSONB property path projects as JSON `null`, but it does not satisfy `IS NULL`; explicit JSON `null` does. Planned outside this subset: - Relationship source-row hydration. - Path-function `WITH` projections. - Aggregate `WITH` projections. - Full path variable syntax. - Multi-pattern optional joins. Supported write coverage is intentionally narrow. `CREATE` can insert one node row into a registered source table, `SET` can update one mapped property on one matched node, `REMOVE` can clear one mapped property on one matched node, and directed `DELETE` can remove one matched relationship backed by a registered edge row table. `DETACH DELETE` can remove one matched node and its incident registered edge-row relationships. `MERGE` can insert one mapped node or match one mapped node by registered primary-key identity and run one branch update. These writes require that the active graph was built with `mode := 'mutable_overlay'` and `graph.mutable_enabled = on`: ```sql SELECT row FROM graph.gql( 'CREATE (c:customers {id: $id, name: $name}) RETURN c', params := '{"id":"c-42","name":"Ada"}'::jsonb ); ``` ```sql SELECT row FROM graph.gql( 'MATCH (c:customers {id: $id}) SET c.status = $status RETURN c.status AS status', params := '{"id":"c-42","status":"active"}'::jsonb ); ``` ```sql SELECT row FROM graph.gql( 'MATCH (c:customers {id: $id}) REMOVE c.status RETURN c.status AS status', params := '{"id":"c-42"}'::jsonb ); ``` ```sql SELECT row FROM graph.gql( 'MATCH (a:customers {id: $from})-[r:purchased]->(b:products {id: $to}) DELETE r RETURN a.id AS customer_id, b.id AS product_id', params := '{"from":"c-42","to":"p-9"}'::jsonb ); ``` ```sql SELECT row FROM graph.gql( 'MATCH (c:customers {id: $id}) DETACH DELETE c RETURN c.id AS customer_id', params := '{"id":"c-42"}'::jsonb ); ``` ```sql SELECT row FROM graph.gql( 'MERGE (c:customers {id: $id, name: $name}) ON CREATE SET c.status = ''new'' ON MATCH SET c.last_seen_at = $seen_at RETURN c.id AS customer_id, c.status AS status', params := '{"id":"c-42","name":"Ava","seen_at":"2026-05-31T12:00:00Z"}'::jsonb ); ``` The write is PostgreSQL-first: pgGraph inserts, updates, or deletes the mapped source row through SPI, PostgreSQL enforces table privileges, constraints, RLS, triggers, and MVCC, and the GQL result is built from the affected row or matched graph coordinates. Within the same transaction, node-only `MATCH` queries can read newly created nodes through the backend-local transaction delta. `SET` updates registered non-key, non-tenant property columns and refreshes any registered typed filter index value for that backend. `REMOVE` sets a mapped scalar column to SQL `NULL` or removes a registered dotted key from a JSONB property column; repeated `REMOVE` calls are idempotent. Labels map to registered source tables, so label removal is not part of the supported write model. `DELETE` tombstones the matching graph edge in the current backend after PostgreSQL deletes exactly one edge row; endpoint nodes are not deleted. `DETACH DELETE` enumerates registered incident edge-row tables, deletes matching forward and reverse edge rows first, then deletes the matched node row and tombstones the node and deleted graph edges in the current backend. If any step fails, PostgreSQL aborts the statement and the source rows remain authoritative. `DETACH DELETE` rejects incident relationships that are not backed by registered edge row tables because pgGraph has no safe generic policy for rewriting arbitrary endpoint-table foreign keys. `MERGE` is supported for one mapped node. The registered primary key columns are the match identity, tenant scope participates when configured, `ON CREATE` values are evaluated only for inserted rows, and `ON MATCH` values are evaluated only for matched rows. pgGraph locks an existing matched row with `FOR UPDATE` only when `ON MATCH SET` can update it; insert races use PostgreSQL `ON CONFLICT DO NOTHING` on the registered identity, so concurrent sessions racing the same key converge on one PostgreSQL source row. `MERGE` requires `SELECT` and `INSERT` on the mapped source table; `UPDATE` is required only when the statement includes `ON MATCH SET`. `CREATE`, `SET`, `REMOVE`, `DELETE`, `DETACH DELETE`, and `MERGE` are rejected on `csr_readonly` projections. Relationship creation, multi-row writes, dynamic labels, and creating new PostgreSQL schema objects from GQL are future work. Use `graph.gql_explain(query)` to inspect parse, bind, logical, and physical planning stages without executing the query. ## openCypher Compatibility `graph.cypher()` is a separate compatibility surface for openCypher-shaped text that maps into the same pgGraph execution model as `graph.gql()`. It is not a full openCypher-compatible database API. Supported overlapping reads and mapped writes lower into the shared GQL IR, so labels still resolve to registered PostgreSQL source tables, relationship types still resolve to registered edge labels, and PostgreSQL remains authoritative for ACLs, RLS, constraints, triggers, and MVCC. ```sql SELECT row FROM graph.cypher( 'MATCH (o:orders)-[:placed_by]->(c:customers) RETURN o.id AS order_id, c.name AS customer ORDER BY customer LIMIT 20', hydrate := true ); ``` Use `graph.cypher_explain(query)` to inspect the shared plan selected for a Cypher-compatible query. Use `graph.cypher_compatibility()` to inspect the current compatibility matrix. Cypher-only constructs such as `CALL`, `YIELD`, `UNWIND`, `FOREACH`, `LOAD CSV`, `START`, `UNION`, and Cypher DDL are rejected because they do not map to pgGraph's PostgreSQL-first model. ## Search Modes | Mode | SQL predicate shape | |---|---| | `contains` | Match substring with LIKE/ILIKE escaping | | `exact` | Match normalized whole value | | `prefix` | Match values beginning with the query string | | `token` | Match tokenized value boundaries | All modes validate that `property_key` is registered for the target table set. Case-insensitive search normalizes the source expression with `lower(::text)`. Add matching expression indexes on high-volume search columns so PostgreSQL can use the same expression shape as pgGraph's predicate: ```sql CREATE INDEX users_name_lower_idx ON public.users (lower(name::text)); ``` For substring-heavy `contains` search, use PostgreSQL trigram indexing on the same normalized expression: ```sql CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX users_name_lower_trgm_idx ON public.users USING gin (lower(name::text) gin_trgm_ops); ``` Create these indexes on the registered property columns and tables that are actually searched. Case-sensitive `exact` and `prefix` searches can use normal indexes when their predicate shape matches the source column and table filter. ## Traversal ```sql SELECT * FROM graph.traverse( seed_table := 'public.users'::regclass, seed_id := 'u1', max_depth := 2, edge_types := ARRAY['friend'], direction := 'any', node_tables := ARRAY['public.users'::regclass], filter := NULL, tenant := NULL, strategy := 'bfs', uniqueness := 'node_global', include_start := true, hydrate := true, max_rows := 1000, row_offset := 0, max_nodes := current_setting('graph.max_nodes')::int, max_frontier := current_setting('graph.max_frontier')::int ); ``` Return columns: | Column | Meaning | |---|---| | `root_table`, `root_id` | Start node coordinate | | `node_table`, `node_id` | Reached node coordinate | | `depth` | Hop count from the start node | | `path` | JSONB array of `{table, id}` path coordinates | | `edge_path` | JSONB array of edge labels along the path | | `node` | Hydrated source row when requested | | `root_table_name`, `node_table_name` | Regclass text names | Traversal applies graph traversal limits first, then table filters and hydration filters, then sorts by depth, node table, and node ID for a single root before applying `row_offset` and `max_rows`. Multi-start traversal sorts by root table, root ID, depth, node table, and node ID before pagination. `uniqueness := 'node_global'` returns each reached node at most once across the merged multi-start result set. When more than one root reaches the same node, the row that sorts first by the multi-start ordering is kept. Use `uniqueness := 'node_per_root'` when the same reached node should be returned once for each root that found it. For a single start node, both modes produce the same node-level result set. Traversal filters are structured JSON predicates built with helper functions such as `graph.eq()`, `graph.not_in()`, `graph.between()`, and `graph.is_null()`. Text filters can also use `graph.contains_text()` and `graph.prefix_text()` for substring and prefix checks. Registered filter columns are pushed down into the graph filter index; predicates on other registered node columns are checked against hydrated source rows. ```sql SELECT * FROM graph.traverse( seed_table := 'public.users'::regclass, seed_id := 'u1', filter := graph.all(ARRAY[ graph.in('status', '["active", "trial"]'::jsonb), graph.is_not_null('created_at') ]) ); ``` ### Direction | Direction | Behavior | |---|---| | `any` | Uses the forward CSR store; bidirectional registrations create both edge directions | | `out` | Uses the forward CSR store | | `in` | Uses the reverse CSR store | For unidirectional edges, `direction := 'out'` follows the registered source to target direction and `direction := 'in'` follows the reverse store. ### Strategy | Strategy | Behavior | |---|---| | `bfs` | Breadth-first traversal; default | | `dfs` | Depth-first traversal | | `weighted` | Rejected for traversal; use `graph.weighted_shortest_path()` | ### Multi-Start Traversal Parallel-array overload: ```sql SELECT * FROM graph.traverse( ARRAY['public.users'::regclass, 'public.users'::regclass], ARRAY['u1'::text, 'u2'::text], max_depth := 1, hydrate := false ); ``` Composite overload: ```sql SELECT * FROM graph.traverse( ARRAY[ graph.node_ref('public.users'::regclass, 'u1'), graph.node_ref('public.users'::regclass, 'u2') ]::graph.node_ref[], max_depth := 1, hydrate := false ); ``` ## Workflow API Workflow functions wrap the primitives with smaller call shapes and stable application-friendly output. ### `graph.find()` ```sql SELECT * FROM graph.find( 'name', 'Alice', table_name := 'public.users'::regclass, mode := 'exact', max_rows := 20 ); ``` Adds `rank` and returns hydrated rows by default. ### `graph.expand()` Expand from a known node: ```sql SELECT * FROM graph.expand( 'public.users'::regclass, 'u1', max_depth := 2, target_table := 'public.users'::regclass, where_node := graph.gt('age', 40), include_start := false ); ``` Adds `rank`, `readable_path`, and `truncated`. ### `graph.find_related()` Search for a seed, traverse from each verified match, filter, hydrate, and page: ```sql SELECT * FROM graph.find_related( property_key := 'name', property_value := 'Alice', source_table := 'public.users'::regclass, max_depth := 2, target_table := 'public.users'::regclass, where_node := graph.gte('age', 40), include_counts := true ); ``` Use `candidate_limit` to bound work before final pagination. ### `graph.path()` Readable shortest path wrapper: ```sql SELECT * FROM graph.path( 'public.users'::regclass, 'u1', 'public.users'::regclass, 'u2', max_depth := 20 ); ``` ### `graph.connection()` Search both endpoints and return the first reachable path: ```sql SELECT * FROM graph.connection( source_key := 'name', source_value := 'Alice', target_key := 'name', target_value := 'Bob', source_table := 'public.users'::regclass, target_table := 'public.users'::regclass, source_k := 3, target_k := 3, max_depth := 6 ); ``` ### `graph.neighborhood()` Summarize reachable nodes by depth and table before hydration: ```sql SELECT * FROM graph.neighborhood( 'name', 'Alice', source_table := 'public.users'::regclass, max_depth := 4, sample_k := 5, node_limit := 10000 ); ``` ## Path Functions Unweighted shortest path: ```sql SELECT * FROM graph.shortest_path( 'public.users'::regclass, 'u1', 'public.users'::regclass, 'u9', max_depth := 20, hydrate := true ); ``` Weighted shortest path: ```sql SELECT * FROM graph.weighted_shortest_path( 'public.places'::regclass, 'a', 'public.places'::regclass, 'd' ); ``` Unweighted shortest path includes pending trigger-sync edge overlays after `graph.apply_sync()`. Weighted shortest path requires merged weighted CSR data; when edge overlays are pending, it fails with `PG018` until `graph.vacuum()` or `graph.maintenance()` rebuilds the weighted graph. Weighted path return columns: | Column | Type | Meaning | |---|---|---| | `step` | `INTEGER` | Zero-based path step | | `node_table` | `OID` | Node table OID for this step | | `node_table_name` | `TEXT` | Readable table name for this step | | `node_id` | `TEXT` | Node primary key for this step | | `edge_label` | `TEXT` | Incoming edge label, NULL for step `0` | | `edge_weight` | `BIGINT` | Incoming edge weight, NULL for step `0` | | `step_cost` | `BIGINT` | Cumulative cost at this step | | `total_cost` | `BIGINT` | Total path cost repeated on each row | ## Components Connected components are global O(V+E) algorithms and require graph-admin privileges. ```sql SELECT * FROM graph.component_stats(); SELECT * FROM graph.components(max_rows := 20); SELECT * FROM graph.largest_component(max_rows := 100, hydrate := false); SELECT * FROM graph.component(123, max_rows := 100, hydrate := true); SELECT * FROM graph.isolated_nodes(max_rows := 100); ``` Component functions touch the whole graph. Run them on a read replica or during controlled windows for large production graphs. ## Hydration Hydration converts graph coordinates back to source rows as JSONB. It is useful for application responses but costs extra SQL work. Use `hydrate := false` when: | Situation | Reason | |---|---| | Counting rows | Avoid unnecessary source-row reads | | Joining manually | Coordinates are enough | | Large traversals | Keep query memory and SPI work bounded | | Internal composition | Later steps decide what to hydrate | ## Tenant Scope If any registered table has `tenant_column`, tenant-scoped queries use the tenant value to filter graph membership. Pass `tenant` explicitly: ```sql SELECT * FROM graph.traverse( 'public.users'::regclass, 'u1', tenant := 'acct_123' ); ``` Or configure a session GUC fallback: ```sql SET graph.tenant_setting = 'app.tenant_id'; SET app.tenant_id = 'acct_123'; ``` When `graph.enforce_tenant_scope = true`, tenanted graph queries without explicit or session tenant fail.