---
layout: default
title: GraphRAG Segmentation Plan
nav_order: 10
---

# GraphRAG segmentation plan

This document scopes the next large-scale GraphRAG branch after the `0.13`
fact-shaped stable release.

The immediate problem is not correctness of the narrow GraphRAG contract.
The immediate problem is scale on constrained-memory hosts.

At `10M x 64D` on the current AWS ARM64 box (`4 vCPU`, `8 GiB RAM`,
`4 GiB swap`), the monolithic `sorted_hnsw` build is still the practical
frontier even after the retained build improvements:

- streamed load survives
- `sorted_hnsw.build_sq8 = on` materially reduces build-vector memory
- the build now stays alive deep into `CREATE INDEX`
- but the operating model is still one large ANN graph on one small host

That is the wrong long-term shape for hundreds of millions or billions of
facts.

## Current verified constraint

Current GraphRAG helpers and wrappers operate on a concrete `sorted_heap`
relation. They do **not** currently dispatch across a partitioned-table
parent.

So the first scalable segmentation step is:

1. split facts into multiple concrete `sorted_heap` shards
2. build one `sorted_hnsw` index per shard
3. route each query to:
   - one shard when pruning is available, or
   - a bounded shard subset when pruning is partial
4. merge shard-local top-k rows globally and keep the final exact/path-aware
   rerank contract unchanged

## First benchmark result

The first segmented benchmark lives in
`scripts/bench_graph_rag_multidepth_segmented.py`.

It is a harness-side benchmark, not a released SQL API. It measures the first
two routing extremes:

- `route=all`
  - query every shard
  - merge all shard-local top-k rows
- `route=exact`
  - synthetic lower bound
  - route to the known owning shard only

Local `1M x 64D` lower-hop point (`8` shards, `ann_k=256`, `top_k=32`,
`ef_search=128`, `m=16`, `ef_construction=64`, `build_sq8=on`):

- monolith unified GraphRAG:
  - depth 1: `50.104 ms`, `100.0% / 100.0%`
  - depth 5: `121.524 ms`, `81.2% / 100.0%`
- segmented, `route=all`:
  - depth 1: `87.677 ms`, `100.0% / 100.0%`
  - depth 5: `142.472 ms`, `81.2% / 100.0%`
- segmented, `route=exact`:
  - depth 1: `10.574 ms`, `100.0% / 100.0%`
  - depth 5: `16.822 ms`, `100.0% / 100.0%`

This is the key lesson:

- segmentation alone is **not** a free latency win
- all-shard fanout preserves quality but pays a clear fanout tax
- the real gain comes only when the query path can prune most shards

So the right future contract is not just "partitioned indexes".
It is **segmentation + pruning/routing**.

## Recommended segmentation model

The recommended shape is:

- shard facts by a stable routing key
- keep each shard as a concrete `sorted_heap` table
- keep one `sorted_hnsw` index per shard
- query a bounded shard subset
- merge and rerank globally

Good routing keys depend on workload:

- `tenant_id`
  - strongest default for multi-tenant knowledge bases
- `knowledge_base_id`
  - if the system stores separate corpora per KB
- `relation family`
  - if relation sets are naturally disjoint
- time window / sealed segment
  - for append-heavy pipelines with freshness constraints

Avoid relying on entity-range sharding alone as the product story.
It is useful for synthetic benchmarking, but real deployments need routing keys
that are available from query context or cheap metadata.

## Rollout phases

### Phase 1: harness and operational benchmarks

Goal:
- prove that segmented builds fit constrained hosts better than monoliths
- quantify the difference between:
  - all-shard fanout
  - bounded fanout
  - exact routing

Deliverables:
- current local segmented harness
- AWS segmented runner:
  - `scripts/bench_graph_rag_multidepth_segmented_aws.sh`
- retained-temp build/query measurements on the same host that currently
  struggles with the monolithic `10M x 64D` point

### Phase 2: SQL-level segmented reference path

Goal:
- move beyond harness-only fanout

Reference design:
- first step now exists as a beta wrapper:
  - `sorted_heap_graph_rag_segmented(regclass[], ...)`
  - executes `sorted_heap_graph_rag(...)` per shard
  - merges candidate rows in SQL
- next step now also exists in narrow form:
  - `sorted_heap_graph_segment_register(...)`
  - `sorted_heap_graph_segment_resolve(...)`
  - `sorted_heap_graph_rag_routed(...)`
  - this is a metadata-driven `int8` range router layered on top of the
    segmented wrapper
- and one more practical routing surface now exists:
  - `sorted_heap_graph_exact_register(...)`
  - `sorted_heap_graph_exact_resolve(...)`
  - `sorted_heap_graph_rag_routed_exact(...)`
  - this is the exact-key router for tenant / KB style shard selection
- and the first richer metadata filter now exists on top of both routed paths:
  - optional `segment_group` labels at registration time
  - optional `segment_groups text[]` filters at resolve/query time
  - the filter array order is also a bounded-fanout preference order
  - this is the first beta surface for hot/sealed or relation-family pruning
- and the first registry-backed reuse layer now exists on top of that:
  - `sorted_heap_graph_route_policy_register(...)`
  - `sorted_heap_graph_route_policy_groups(...)`
  - `sorted_heap_graph_rag_routed_policy(...)`
  - `sorted_heap_graph_rag_routed_exact_policy(...)`
  - this keeps hot/sealed preference out of ad hoc query literals
- and the first second routing dimension now exists too:
  - optional `relation_family text` on both range-routed and exact-key shard
    registry rows
  - optional `relation_family := ...` filtering in config/resolve functions
    and in both raw/policy-backed routed GraphRAG wrappers
  - this is still narrow beta metadata, not a finished general router
- and the first multi-valued shard-label filter now exists too:
  - optional shared `segment_labels text[]` in
    `sorted_heap_graph_segment_meta_registry`
  - optional `segment_labels := ARRAY[...]` filtering in range/exact
    config/resolve functions and in raw/policy/profile/default routed wrappers
  - route profiles can now bundle `segment_labels` alongside
    `policy_name or segment_groups + relation_family + fanout_limit`
  - this is the first richer metadata dimension beyond
    `segment_group + relation_family`
- and the first reusable route-profile layer now exists on top of that:
  - `sorted_heap_graph_route_profile_register(...)`
  - `sorted_heap_graph_route_profile_resolve(...)`
  - `sorted_heap_graph_rag_routed_profile(...)`
  - `sorted_heap_graph_rag_routed_exact_profile(...)`
  - this now bundles either:
    - `policy_name + relation_family + fanout_limit + segment_labels`, or
    - inline `segment_groups + relation_family + fanout_limit + segment_labels`
  - so the operator no longer needs a separate policy row just to save one
    shard-group ordering
- and the next operator shortcut now exists on top of profiles:
  - `sorted_heap_graph_route_default_register(...)`
  - `sorted_heap_graph_route_default_resolve(...)`
  - `sorted_heap_graph_rag_routed_default(...)`
  - `sorted_heap_graph_rag_routed_exact_default(...)`
  - this lets one route bind a default profile once instead of passing
    `profile_name` in every query
- and the next registry cleanup now exists under the routed path:
  - `sorted_heap_graph_segment_meta_register(...)`
  - `sorted_heap_graph_segment_meta_config(...)`
  - `sorted_heap_graph_segment_meta_unregister(...)`
  - range-routed and exact-key routed rows can now leave
    `segment_group` / `relation_family` as `NULL` and inherit them from
    shard-local metadata instead
  - when both are present, row-local routed metadata still overrides the
    shared shard metadata
- and the next operator-facing introspection layer now exists on top of that:
  - `sorted_heap_graph_segment_catalog(...)`
  - `sorted_heap_graph_exact_catalog(...)`
  - these expose route-local metadata, shared shard metadata, effective
    resolved metadata, and per-column source markers (`route|shared|unset`)
  - this does not change routing behavior; it makes the current registry model
    easier to inspect and debug
- and the next operator-facing profile/default catalog now exists too:
  - `sorted_heap_graph_route_profile_catalog(...)`
  - this exposes profile-local `policy_name`, inline `segment_groups`,
    policy-backed `segment_groups`, effective group order, optional
    profile-level `segment_labels`, the source marker (`inline|policy|unset`),
    and whether the profile is currently the route default
  - this also does not change routing behavior; it makes the profile/default
    layer easier to inspect and debug
- and the next route-level operator summary now exists on top of that:
  - `sorted_heap_graph_route_catalog(...)`
  - this gives one row per route with range-shard count, exact-binding count,
    policy/profile counts, and the effective default-profile contract,
    including default `segment_labels`
  - this also does not change routing behavior; it makes the whole routed
    control plane easier to inspect at a glance
- and the unified operator-facing dispatcher now exists on top of all of
  the above:
  - `sorted_heap_graph_route(...)` — single query entry point that
    dispatches to the appropriate routed path (exact-key or range,
    with optional profile/policy/default resolution)
  - `sorted_heap_graph_route_plan(...)` — explains the routing
    resolution without executing GraphRAG
  - see `docs/api.md` "Routed GraphRAG: operator recipe" for the
    recommended app-facing setup/inspect/query flow
- what is still missing:
  - richer metadata than one shared `text[]` label dimension
  - a product-quality shard router contract for tenant / KB / relation-family
    pruning without hand-managed registration tables

The Phase 2 reference path is now usable as an operator-facing beta
surface through `sorted_heap_graph_route(...)`. The lower-level routed
wrappers remain available as building blocks.

### Phase 3: productized router

Goal:
- make shard pruning cheap and stable

Possible router inputs:
- exact tenant / KB key from the application
- relation-path-level narrowing
- segment metadata tables
- a cheap centroid/sketch layer that picks a bounded shard subset before ANN

The router should not change the GraphRAG scoring contract.
Its job is only to narrow which shards need ANN seed retrieval.

### Phase 4: append-friendly large-scale operating model

For very large fact corpora, the likely long-term model is:

- sealed read-optimized segments
- one or more mutable hot segments
- background merge/compaction into larger sealed segments
- bounded query fanout across:
  - current hot segments
  - a pruned subset of sealed segments

This is a better fit for:

- hundreds of millions / billions of facts
- constrained-memory hosts
- fast insert + fast query requirements

## Current recommendation

The first comparison is now complete:

1. the low-memory monolithic AWS `10M x 64D` run completed
2. the same point completed through the streamed segmented AWS harness
3. the result was decisive:
   - `route=all` looked like the monolith
   - `route=exact` was much faster at the same quality

So the current recommendation is narrower and stronger:

1. keep monolithic low-memory work only as a survival path
2. treat `segmentation + routing` as the primary scale direction
3. spend the next engineering dollar on:
   - productizing routing/pruning
   - reducing harness-side shard fanout/merge into a real API/runtime path
   - preserving append-friendly segmented operation

The current evidence now points clearly toward segmented routing as the more
durable large-scale GraphRAG model.

The newest bounded step makes that path slightly less hand-wired: routed and
exact-key routed GraphRAG can now combine:

- route range or exact route key
- stored shard-group policy order
- one optional `relation_family` filter

And the newest ergonomic layer removes one more repeated query burden:

- a named route profile can now store either a policy-backed or inline-group
  family/fanout combination
- profile-backed wrappers reuse the existing routed paths instead of adding a
  new scoring contract

And the newest operator shortcut removes one more argument from the query side:

- a route can now bind one default profile once
- default-backed wrappers resolve that profile implicitly at query time

And the newest narrow cleanup removes one more source of repeated registry
state:

- shard-local metadata can now be registered once per concrete shard relation
- both range-routed and exact-key routed rows can inherit that metadata when
  their own `segment_group` / `relation_family` values are `NULL`
- this reduces duplicated registry data, but it still does not replace the
  current hand-managed routing model

And the newest operator-facing layer makes that model more inspectable:

- range-routed and exact-key routed catalogs now show both raw and effective
  metadata
- each effective metadata column also reports whether it came from the route
  row, the shared shard metadata row, or remained unset
- this is deliberately introspection-only; it does not widen the routing
  contract

That is still beta, but it is the first real multi-dimensional routing surface
inside the extension. The next honest step is no longer “can we add metadata?”
but “which metadata should become first-class beyond shard group + one family
label, and how much routing can move from ad hoc registries into a cleaner
operator model?”