--- layout: default title: GraphRAG Segmentation Plan nav_order: 10 --- # GraphRAG segmentation plan This document scopes the next large-scale GraphRAG branch after the `0.13` fact-shaped stable release. The immediate problem is not correctness of the narrow GraphRAG contract. The immediate problem is scale on constrained-memory hosts. At `10M x 64D` on the current AWS ARM64 box (`4 vCPU`, `8 GiB RAM`, `4 GiB swap`), the monolithic `sorted_hnsw` build is still the practical frontier even after the retained build improvements: - streamed load survives - `sorted_hnsw.build_sq8 = on` materially reduces build-vector memory - the build now stays alive deep into `CREATE INDEX` - but the operating model is still one large ANN graph on one small host That is the wrong long-term shape for hundreds of millions or billions of facts. ## Current verified constraint Current GraphRAG helpers and wrappers operate on a concrete `sorted_heap` relation. They do **not** currently dispatch across a partitioned-table parent. So the first scalable segmentation step is: 1. split facts into multiple concrete `sorted_heap` shards 2. build one `sorted_hnsw` index per shard 3. route each query to: - one shard when pruning is available, or - a bounded shard subset when pruning is partial 4. merge shard-local top-k rows globally and keep the final exact/path-aware rerank contract unchanged ## First benchmark result The first segmented benchmark lives in `scripts/bench_graph_rag_multidepth_segmented.py`. It is a harness-side benchmark, not a released SQL API. It measures the first two routing extremes: - `route=all` - query every shard - merge all shard-local top-k rows - `route=exact` - synthetic lower bound - route to the known owning shard only Local `1M x 64D` lower-hop point (`8` shards, `ann_k=256`, `top_k=32`, `ef_search=128`, `m=16`, `ef_construction=64`, `build_sq8=on`): - monolith unified GraphRAG: - depth 1: `50.104 ms`, `100.0% / 100.0%` - depth 5: `121.524 ms`, `81.2% / 100.0%` - segmented, `route=all`: - depth 1: `87.677 ms`, `100.0% / 100.0%` - depth 5: `142.472 ms`, `81.2% / 100.0%` - segmented, `route=exact`: - depth 1: `10.574 ms`, `100.0% / 100.0%` - depth 5: `16.822 ms`, `100.0% / 100.0%` This is the key lesson: - segmentation alone is **not** a free latency win - all-shard fanout preserves quality but pays a clear fanout tax - the real gain comes only when the query path can prune most shards So the right future contract is not just "partitioned indexes". It is **segmentation + pruning/routing**. ## Recommended segmentation model The recommended shape is: - shard facts by a stable routing key - keep each shard as a concrete `sorted_heap` table - keep one `sorted_hnsw` index per shard - query a bounded shard subset - merge and rerank globally Good routing keys depend on workload: - `tenant_id` - strongest default for multi-tenant knowledge bases - `knowledge_base_id` - if the system stores separate corpora per KB - `relation family` - if relation sets are naturally disjoint - time window / sealed segment - for append-heavy pipelines with freshness constraints Avoid relying on entity-range sharding alone as the product story. It is useful for synthetic benchmarking, but real deployments need routing keys that are available from query context or cheap metadata. ## Rollout phases ### Phase 1: harness and operational benchmarks Goal: - prove that segmented builds fit constrained hosts better than monoliths - quantify the difference between: - all-shard fanout - bounded fanout - exact routing Deliverables: - current local segmented harness - AWS segmented runner: - `scripts/bench_graph_rag_multidepth_segmented_aws.sh` - retained-temp build/query measurements on the same host that currently struggles with the monolithic `10M x 64D` point ### Phase 2: SQL-level segmented reference path Goal: - move beyond harness-only fanout Reference design: - first step now exists as a beta wrapper: - `sorted_heap_graph_rag_segmented(regclass[], ...)` - executes `sorted_heap_graph_rag(...)` per shard - merges candidate rows in SQL - next step now also exists in narrow form: - `sorted_heap_graph_segment_register(...)` - `sorted_heap_graph_segment_resolve(...)` - `sorted_heap_graph_rag_routed(...)` - this is a metadata-driven `int8` range router layered on top of the segmented wrapper - and one more practical routing surface now exists: - `sorted_heap_graph_exact_register(...)` - `sorted_heap_graph_exact_resolve(...)` - `sorted_heap_graph_rag_routed_exact(...)` - this is the exact-key router for tenant / KB style shard selection - and the first richer metadata filter now exists on top of both routed paths: - optional `segment_group` labels at registration time - optional `segment_groups text[]` filters at resolve/query time - the filter array order is also a bounded-fanout preference order - this is the first beta surface for hot/sealed or relation-family pruning - and the first registry-backed reuse layer now exists on top of that: - `sorted_heap_graph_route_policy_register(...)` - `sorted_heap_graph_route_policy_groups(...)` - `sorted_heap_graph_rag_routed_policy(...)` - `sorted_heap_graph_rag_routed_exact_policy(...)` - this keeps hot/sealed preference out of ad hoc query literals - and the first second routing dimension now exists too: - optional `relation_family text` on both range-routed and exact-key shard registry rows - optional `relation_family := ...` filtering in config/resolve functions and in both raw/policy-backed routed GraphRAG wrappers - this is still narrow beta metadata, not a finished general router - and the first multi-valued shard-label filter now exists too: - optional shared `segment_labels text[]` in `sorted_heap_graph_segment_meta_registry` - optional `segment_labels := ARRAY[...]` filtering in range/exact config/resolve functions and in raw/policy/profile/default routed wrappers - route profiles can now bundle `segment_labels` alongside `policy_name or segment_groups + relation_family + fanout_limit` - this is the first richer metadata dimension beyond `segment_group + relation_family` - and the first reusable route-profile layer now exists on top of that: - `sorted_heap_graph_route_profile_register(...)` - `sorted_heap_graph_route_profile_resolve(...)` - `sorted_heap_graph_rag_routed_profile(...)` - `sorted_heap_graph_rag_routed_exact_profile(...)` - this now bundles either: - `policy_name + relation_family + fanout_limit + segment_labels`, or - inline `segment_groups + relation_family + fanout_limit + segment_labels` - so the operator no longer needs a separate policy row just to save one shard-group ordering - and the next operator shortcut now exists on top of profiles: - `sorted_heap_graph_route_default_register(...)` - `sorted_heap_graph_route_default_resolve(...)` - `sorted_heap_graph_rag_routed_default(...)` - `sorted_heap_graph_rag_routed_exact_default(...)` - this lets one route bind a default profile once instead of passing `profile_name` in every query - and the next registry cleanup now exists under the routed path: - `sorted_heap_graph_segment_meta_register(...)` - `sorted_heap_graph_segment_meta_config(...)` - `sorted_heap_graph_segment_meta_unregister(...)` - range-routed and exact-key routed rows can now leave `segment_group` / `relation_family` as `NULL` and inherit them from shard-local metadata instead - when both are present, row-local routed metadata still overrides the shared shard metadata - and the next operator-facing introspection layer now exists on top of that: - `sorted_heap_graph_segment_catalog(...)` - `sorted_heap_graph_exact_catalog(...)` - these expose route-local metadata, shared shard metadata, effective resolved metadata, and per-column source markers (`route|shared|unset`) - this does not change routing behavior; it makes the current registry model easier to inspect and debug - and the next operator-facing profile/default catalog now exists too: - `sorted_heap_graph_route_profile_catalog(...)` - this exposes profile-local `policy_name`, inline `segment_groups`, policy-backed `segment_groups`, effective group order, optional profile-level `segment_labels`, the source marker (`inline|policy|unset`), and whether the profile is currently the route default - this also does not change routing behavior; it makes the profile/default layer easier to inspect and debug - and the next route-level operator summary now exists on top of that: - `sorted_heap_graph_route_catalog(...)` - this gives one row per route with range-shard count, exact-binding count, policy/profile counts, and the effective default-profile contract, including default `segment_labels` - this also does not change routing behavior; it makes the whole routed control plane easier to inspect at a glance - and the unified operator-facing dispatcher now exists on top of all of the above: - `sorted_heap_graph_route(...)` — single query entry point that dispatches to the appropriate routed path (exact-key or range, with optional profile/policy/default resolution) - `sorted_heap_graph_route_plan(...)` — explains the routing resolution without executing GraphRAG - see `docs/api.md` "Routed GraphRAG: operator recipe" for the recommended app-facing setup/inspect/query flow - what is still missing: - richer metadata than one shared `text[]` label dimension - a product-quality shard router contract for tenant / KB / relation-family pruning without hand-managed registration tables The Phase 2 reference path is now usable as an operator-facing beta surface through `sorted_heap_graph_route(...)`. The lower-level routed wrappers remain available as building blocks. ### Phase 3: productized router Goal: - make shard pruning cheap and stable Possible router inputs: - exact tenant / KB key from the application - relation-path-level narrowing - segment metadata tables - a cheap centroid/sketch layer that picks a bounded shard subset before ANN The router should not change the GraphRAG scoring contract. Its job is only to narrow which shards need ANN seed retrieval. ### Phase 4: append-friendly large-scale operating model For very large fact corpora, the likely long-term model is: - sealed read-optimized segments - one or more mutable hot segments - background merge/compaction into larger sealed segments - bounded query fanout across: - current hot segments - a pruned subset of sealed segments This is a better fit for: - hundreds of millions / billions of facts - constrained-memory hosts - fast insert + fast query requirements ## Current recommendation The first comparison is now complete: 1. the low-memory monolithic AWS `10M x 64D` run completed 2. the same point completed through the streamed segmented AWS harness 3. the result was decisive: - `route=all` looked like the monolith - `route=exact` was much faster at the same quality So the current recommendation is narrower and stronger: 1. keep monolithic low-memory work only as a survival path 2. treat `segmentation + routing` as the primary scale direction 3. spend the next engineering dollar on: - productizing routing/pruning - reducing harness-side shard fanout/merge into a real API/runtime path - preserving append-friendly segmented operation The current evidence now points clearly toward segmented routing as the more durable large-scale GraphRAG model. The newest bounded step makes that path slightly less hand-wired: routed and exact-key routed GraphRAG can now combine: - route range or exact route key - stored shard-group policy order - one optional `relation_family` filter And the newest ergonomic layer removes one more repeated query burden: - a named route profile can now store either a policy-backed or inline-group family/fanout combination - profile-backed wrappers reuse the existing routed paths instead of adding a new scoring contract And the newest operator shortcut removes one more argument from the query side: - a route can now bind one default profile once - default-backed wrappers resolve that profile implicitly at query time And the newest narrow cleanup removes one more source of repeated registry state: - shard-local metadata can now be registered once per concrete shard relation - both range-routed and exact-key routed rows can inherit that metadata when their own `segment_group` / `relation_family` values are `NULL` - this reduces duplicated registry data, but it still does not replace the current hand-managed routing model And the newest operator-facing layer makes that model more inspectable: - range-routed and exact-key routed catalogs now show both raw and effective metadata - each effective metadata column also reports whether it came from the route row, the shared shard metadata row, or remained unset - this is deliberately introspection-only; it does not widen the routing contract That is still beta, but it is the first real multi-dimensional routing surface inside the extension. The next honest step is no longer “can we add metadata?” but “which metadata should become first-class beyond shard group + one family label, and how much routing can move from ad hoc registries into a cleaner operator model?”