# Changelog ## 0.13.0 (in development) ### GraphRAG release status - The narrow fact-shaped GraphRAG contract is now the intended stable `0.13` release surface: - `sorted_heap_graph_rag(...)` - `sorted_heap_graph_register(...)` - `sorted_heap_graph_config(...)` - `sorted_heap_graph_unregister(...)` - `sorted_heap_graph_rag_stats()` - `sorted_heap_graph_rag_reset_stats()` - Lower-level helper/wrapper building blocks remain beta: - `sorted_heap_expand_ids(...)` - `sorted_heap_expand_rerank(...)` - `sorted_heap_expand_twohop_rerank(...)` - `sorted_heap_expand_twohop_path_rerank(...)` - `sorted_heap_graph_rag_scan(...)` - `sorted_heap_graph_rag_twohop_scan(...)` - `sorted_heap_graph_rag_twohop_path_scan(...)` - Code-corpus snippet/symbol/lexical retrieval contracts remain benchmark/reference logic, not the stable SQL surface. - Added `make test-graphrag-release` to run the full GraphRAG release-candidate bundle: - SQL regression - lifecycle - crash recovery - concurrent online-operation coverage - Added `make test-release` to run the broader `0.13` extension release bundle: - core regression smoke - policy/doc contract selftests - dump/restore, TOAST, DDL, crash recovery, concurrent online ops - `pg_upgrade` - the narrower `make test-graphrag-release` bundle - Clarified the GraphRAG docs around: - `limit_rows` as a work cap rather than a final result-count override - one-hop `score_mode := 'path'` being intentionally equivalent to `endpoint` ### FlashHadamard experimental status - Added `sql/flashhadamard_experimental.sql` as the explicit experimental SQL surface for the FlashHadamard retrieval branch. - The current canonical experimental point at `103K x 2880D` is the exhaustive parallel engine scan via mmap-backed store, with `5-8 ms` local p50 and a documented benchmark-reference helper path at `8.7 ms`. - Added `make test-flashhadamard` and `make bench-flashhadamard` as the canonical experiment validation/benchmark entrypoints. - Documented the current execution-model caveat explicitly: `pthread` inside a PostgreSQL backend remains experimental and is not part of the stable `0.13` release contract. - Documented `FH_INT16=1` as an Apple/NEON-only experimental optimization with a partially validated local end-to-end win; it remains opt-in and is not the release default. ### Segmented GraphRAG scale verification - Consolidated the monolithic vs segmented `10M x 64D` comparison in `docs/benchmarks.md` into a single side-by-side table. - On the constrained-memory AWS point (`4 vCPU`, `8 GiB RAM`), segmented exact routing is 8.1x faster at depth 5 with better quality (100%/100% vs 75%/100%) compared to the monolith. - All-shard fanout offers no latency benefit — the win comes entirely from shard pruning. - Added bounded-fanout mode (`--route bounded --fanout K`) to the segmented benchmark harness. On the `1M x 64D` local point: - bounded(2/8) is 3.4x faster than monolithic with 96.9% hit@1 - bounded(4/8) is 1.6x faster with 93.8% hit@1 - latency scales roughly linearly with shards hit - the win is not exact-or-nothing — imperfect routing still helps - Verified bounded fanout transfers to AWS `10M x 64D`: - bounded(2/8) is 4.0x faster than monolithic at depth 5 - bounded(4/8) is 2.0x faster - gradient is smooth and linear with shards hit - Added routing-miss tolerance mode (`--route bounded_recall --recall-pct N`). Quality tracks router recall linearly; no sharp cliff. A router with 90% recall keeps 87.5% hit@1 while remaining 2-3x faster than monolithic. Routing quality determines answer quality, not latency. - Verified routing-miss tolerance at AWS `10M x 64D`: at 90% recall, bounded(2/8) matches monolithic hit@1 (75%) while staying 4x faster. Finer crossover resolution remains limited by the small 4-query point. ### Unified routed GraphRAG wrapper - Added `sorted_heap_graph_route(...)` as a thin operator-facing dispatcher over the existing exact/range + profile/policy/default routed GraphRAG wrappers. - Added `sorted_heap_graph_route_plan(...)` to explain which routing path, effective registry contract, and candidate shards the unified dispatcher would use. - Explicit call-site routing overrides now take precedence over route defaults; profile and policy paths remain mutually exclusive. ### sorted_hnsw shared cache fix - Fixed a multi-index shared-cache corruption bug where `shnsw_shared_scan_cache_attach()` held bare pointers into shared memory. A subsequent publish for a different index overwrote the shared region, silently corrupting the first index's cached HNSW graph. - The attach path now deep-copies all bulk data (L0 neighbors, SQ8 vectors, upper-level neighbor slabs) into local palloc'd buffers. - Added a multi-index overwrite regression phase (B5) to `scripts/test_hnsw_chunked_cache.sh`. - Verified: `shared_cache=on` and `off` produce identical retrieval quality on the 5K x 384D and 10K x 384D multihop benchmarks. `shared_cache=off` is no longer needed as a correctness workaround. ### GraphRAG syntax unification - Added `sorted_heap_graph_rag(...)` as the new unified fact-shaped GraphRAG entry point. - The new syntax accepts: - `relation_path := ARRAY[hop]` for one-hop retrieval - `relation_path := ARRAY[hop1, hop2]` for two-hop retrieval - `relation_path := ARRAY[hop1, hop2, ...]` for explicit multi-hop retrieval - `score_mode := 'endpoint' | 'path'` - One-hop semantics are now aligned with the fact-graph contract: ANN seed selection is on `entity_id`, not `target_id`. - Added regression coverage for: - one-hop unified syntax - two-hop endpoint-scored unified syntax - two-hop path-aware unified syntax - generic path-aware multihop syntax - Added `docs/graphrag-0.13-plan.md` to separate the narrow stable `0.13` target from the broader experimental code-GraphRAG surface. ### GraphRAG schema registration - Added: - `sorted_heap_graph_register(...)` - `sorted_heap_graph_config(...)` - `sorted_heap_graph_unregister(...)` - GraphRAG helpers and wrappers can now run against non-canonical fact-table schemas as long as the mapped columns still satisfy the fact contract: `int4 / int2 / int4 / svec / text`. - Added regression coverage for an alias schema using: - `src_id` - `edge_type` - `dst_id` - `vec` - `body` ### GraphRAG lifecycle hardening - Added `pg_extension_config_dump(...)` coverage for `sorted_heap_graph_registry`, so registered GraphRAG mappings survive `pg_dump` / `pg_restore`. - Added `scripts/test_graph_rag_lifecycle.sh` and `make test-graphrag-lifecycle` to verify: - `0.12.0 -> 0.13.0` extension upgrade - alias-schema registration - registry persistence across dump/restore - persistence of segmented/routed GraphRAG registries across dump/restore: - shared shard metadata - shared `segment_labels` - range routing - exact-key routing - route policies - route profiles - route defaults - effective default `segment_labels` - post-restore GraphRAG query correctness on registered alias schemas - post-restore GraphRAG query correctness on routed/default-backed segmented GraphRAG queries - Added `scripts/test_graph_rag_crash_recovery.sh` and `make test-graphrag-crash` to verify crash recovery for: - committed registered GraphRAG tables - crash during insert into a registered/indexed graph table - crash during compact on a registered graph table ### GraphRAG observability - Added: - `sorted_heap_graph_rag_stats()` - `sorted_heap_graph_rag_reset_stats()` - GraphRAG now exposes backend-local last-call stats for: - seed count - expanded row count - reranked row count - returned row count - ANN / expand / rerank / total timing - Added regression coverage for: - direct helper observability via `sorted_heap_expand_rerank(...)` - unified wrapper observability via `sorted_heap_graph_rag(...)` - The reported `api` field reflects the concrete top-level GraphRAG execution path, so unified wrapper calls report the underlying C path they dispatched to. ### GraphRAG scale harnesses - Added `--hop-weight` to `scripts/bench_graph_rag_multidepth.py` so large synthetic multihop runs can vary the relative hop contribution without changing the SQL/GraphRAG contract. - Added `sorted_hnsw.build_sq8` for constrained-memory index builds. - builds the graph from SQ8-compressed build vectors instead of a full float32 build slab - costs an extra heap scan during `CREATE INDEX` - first bounded local result on `1M x 64D`: - `build_indexes`: `48.606 s -> 46.541 s` - depth-5 unified GraphRAG stayed `87.5% / 100.0%` - Added `scripts/bench_graph_rag_multidepth_aws.sh` to run the synthetic multi-hop depth benchmark on a remote AWS host using the same sync/install pattern as the existing multihop AWS runners. - Added `scripts/bench_graph_rag_multidepth_segmented.py` to benchmark the first partitioning/segmentation path using multiple concrete `sorted_heap` shards plus harness-side fanout and global rerank. - Added `scripts/bench_graph_rag_multidepth_segmented_aws.sh` so the same segmented benchmark can be run on a constrained remote host without manual repo sync/install steps. - Added `docs/graphrag-segmentation-plan.md` to separate the post-`0.13` large-scale architecture from the `0.13` release surface. - Added larger-scale benchmark notes for: - local `1M`-row measured query latency on the synthetic multidepth graph - local and AWS `10M`-row build-bound envelopes, where generation/load now survive but the first practical frontier remains `sorted_hnsw` build time - retained-temp query-only sweeps on the AWS `10M x 32D` cheap-build point, which showed that raising `ef_search`/`ann_k` as high as `256/256` still does not recover depth-5 quality on the same weak graph - a follow-up `1M x 32D` calibration showing that wider query budgets (`ann_k=256`, `top_k=32`) can recover `96.9% hit@k` on smaller graphs even with a cheaper build - a stronger AWS `10M x 32D` falsifier showing that even exact heap seeds still return `0.0% / 0.0%` at `ann_k=256`, `top_k=32`, so the remaining problem there is the low-dimensional scale contract, not just HNSW build quality - a local `1M x 64D` calibration showing the same widened contract reaches `65.6% hit@1 / 96.9% hit@k` and ANN matches exact seeds there - a follow-up `10M x 64D` allocator diagnosis showing the failure came from the old contiguous local L0 scan-cache slabs, not from HNSW graph build itself - a chunked local scan-cache fix that replaces giant local `l0_neighbors` and `sq8_data` allocations with page-backed storage for build seeding and `shnsw_load_cache()` - a constrained-memory AWS `10M x 64D` monolithic rerun on the same `4 vCPU / 8 GiB` host with `sorted_hnsw.build_sq8 = on` and `hop_weight = 0.05` that now completes: - `load_data`: `787.809 s` - `build_indexes`: `846.795 s` - the first retained query-only pass on that exact built graph showing that the monolithic path is now viable but still not the final speed story: - depth 1 unified GraphRAG: `840.607 ms`, `100.0% / 100.0%` - depth 5 unified GraphRAG: `2084.155 ms`, `75.0% / 100.0%` - depth-2+ quality stayed aligned with the SQL baseline, but latency remained about `2x` slower than the SQL path baseline - so the current `10M x 64D` frontier is no longer build survival or quality drift; it is monolithic query cost, which pushes the next scale branch toward segmentation + pruning - a loader fast path for `sorted_heap_only` multidepth runs that copies directly into `facts_sh` before `sorted_heap_compact(...)` instead of staging through `facts_heap`; bounded local checks held the same compacted depth-5 quality/latency while improving ingest by about `10%` (`6.321 s` -> `5.638 s` at `200K` rows, `31.392 s` -> `28.231 s` at `1M` rows) - a new multidepth harness knob `--post-load-op compact|merge|none` to compare post-load maintenance strategies on the same synthetic graph - bounded local evidence that keeps `compact` as the default: `none` is much slower at query time, while `merge` is viable but does not materially beat `compact` on the larger `1M` load point (`28.142 s` versus `28.108 s`) - an opt-in stage-breakdown path `--report-stage-stats` for the multidepth harness, backed by `sorted_heap_graph_rag_stats()` - a local `1M x 64D` lower-hop stage diagnosis showing that the widened multihop path is ANN-bound, not expansion-bound: at depth 5 the unified path took `110.507 ms` end-to-end, of which about `109.178 ms` was ANN, `0.691 ms` expansion, and `0.011 ms` rerank - the first local segmented `1M x 64D` GraphRAG point (`8` shards, `build_sq8=on`) showing: - all-shard fanout preserves quality but is slower than the monolith (`87.677 ms` vs `50.104 ms` at depth 1, `142.472 ms` vs `121.524 ms` at depth 5) - exact routing to the owning shard is the real partitioning win (`10.574 ms` at depth 1, `16.822 ms` at depth 5, stable `100.0% / 100.0%`) - so the next scale contract must be "segmentation + pruning", not just "more shards" - the first full AWS segmented `10M x 64D` rerun on the same constrained `4 vCPU / 8 GiB` host using streamed shard load and `build_sq8=on`: - `generate_csv`: `0.000 s` - `load_data`: `500.474 s` - `build_indexes`: `784.778 s` - `route=all` matched the old monolithic query envelope almost exactly (`898.440 ms` at depth 1, `2093.652 ms` at depth 5) - `route=exact` was the real scale win (`126.057 ms` at depth 1, `258.766 ms` at depth 5, stable `100.0% / 100.0%` at depth 5) - so the constrained-memory large-scale direction is now much clearer: productize segmented routing, not broad all-shard fanout - the first SQL-level segmented reference path: - added `sorted_heap_graph_rag_segmented(regclass[], ...)` - it executes `sorted_heap_graph_rag(...)` across a caller-supplied shard set and merges shard-local top-k rows in SQL - local segmented smoke confirmed the SQL merge path matches the older Python merge path on quality/row counts with similar latency - routing/pruning still stays outside the extension for now; this wrapper only productizes fanout/merge - the first metadata-driven routed GraphRAG reference path: - added `sorted_heap_graph_segment_register(...)`, `sorted_heap_graph_segment_config(...)`, `sorted_heap_graph_segment_resolve(...)`, and `sorted_heap_graph_segment_unregister(...)` - added `sorted_heap_graph_rag_routed(...)` on top of the segmented wrapper - this beta surface lets callers register shard ranges once and then route by a supplied `int8` key before segmented GraphRAG fanout/merge - local routed smoke showed the routed path matches exact-route segmented SQL quality/row counts with only small extra lookup overhead - the exact-key routed companion for tenant / KB style routing: - added `sorted_heap_graph_exact_register(...)`, `sorted_heap_graph_exact_config(...)`, `sorted_heap_graph_exact_resolve(...)`, and `sorted_heap_graph_exact_unregister(...)` - added `sorted_heap_graph_rag_routed_exact(...)` - local exact-key smoke stayed aligned with the exact-route segmented SQL merge path (`0.202 ms` vs `0.183 ms` at depth 5, both `100.0% / 100.0%`) - the first richer shard-group filter on top of routed segmentation: - both range-routed and exact-key routed registries now accept an optional `segment_group` label - both config/resolve functions now accept optional `segment_groups text[]` filters - both routed wrappers now accept optional `segment_groups := ARRAY[...]` to narrow candidate shards before segmented GraphRAG fanout/merge - when `segment_groups` is present, its array order now becomes the shard preference order before bounded fanout is applied - this is the first beta surface for hot/sealed or relation-family shard pruning without changing the GraphRAG scoring contract - the first registry-backed policy layer for shard-group preference: - added `sorted_heap_graph_route_policy_register(...)`, `sorted_heap_graph_route_policy_config(...)`, `sorted_heap_graph_route_policy_groups(...)`, and `sorted_heap_graph_route_policy_unregister(...)` - added `sorted_heap_graph_rag_routed_policy(...)` and `sorted_heap_graph_rag_routed_exact_policy(...)` - this keeps hot/sealed preference in route metadata instead of repeating raw `segment_groups := ARRAY[...]` literals in every query - the first second routing dimension on top of that policy layer: - both range-routed and exact-key shard registries now accept an optional `relation_family` label - both config/resolve functions now accept optional `relation_family := ...` filtering - both raw and policy-backed routed wrappers now accept optional `relation_family := ...` to narrow candidate shards after route resolution but before segmented GraphRAG fanout/merge - regression coverage now proves route+family and route+policy+family filtering for both range and exact-key routing - the first route-profile convenience layer on top of that: - added `sorted_heap_graph_route_profile_register(...)`, `sorted_heap_graph_route_profile_config(...)`, `sorted_heap_graph_route_profile_resolve(...)`, and `sorted_heap_graph_route_profile_unregister(...)` - added `sorted_heap_graph_rag_routed_profile(...)` and `sorted_heap_graph_rag_routed_exact_profile(...)` - this now stores either: - `policy_name + relation_family + fanout_limit`, or - inline `segment_groups + relation_family + fanout_limit` - so one route profile can be self-contained without a separate `sorted_heap_graph_route_policy_registry` row - regression coverage now proves both profile-backed wrappers match the existing sealed/right routed baselines - the first default-profile operator layer on top of routed profiles: - added `sorted_heap_graph_route_default_register(...)`, `sorted_heap_graph_route_default_config(...)`, `sorted_heap_graph_route_default_resolve(...)`, and `sorted_heap_graph_route_default_unregister(...)` - added `sorted_heap_graph_rag_routed_default(...)` and `sorted_heap_graph_rag_routed_exact_default(...)` - this binds one default profile per route so callers no longer need to pass `profile_name` on every query - regression coverage now proves both default-backed wrappers match the same sealed/right routed baselines as the explicit profile paths - a shared shard-metadata cleanup under the routed beta surface: - added `sorted_heap_graph_segment_meta_register(...)`, `sorted_heap_graph_segment_meta_config(...)`, and `sorted_heap_graph_segment_meta_unregister(...)` - range-routed and exact-key routed config/resolve paths now fall back to shared per-shard `segment_group` / `relation_family` metadata when the route row leaves those labels `NULL` - row-local routed metadata still overrides shared shard metadata when both are present - regression coverage now proves both routed wrappers work when those labels live only in the shared shard-metadata registry - the first multi-valued shard-label filter on top of that: - shared shard metadata now also accepts optional `segment_labels text[]` - range/exact config, catalog, and resolve functions now accept optional `segment_labels := ARRAY[...]` filters - raw, policy-backed, profile-backed, and default-backed routed wrappers now propagate that filter without changing GraphRAG scoring semantics - route profiles and route/default catalogs now expose profile-level and default effective `segment_labels` - regression coverage now proves label-based pruning through the shared shard-metadata path for both range and exact-key routing - a first operator-facing shard catalog on top of that: - added `sorted_heap_graph_segment_catalog(...)` and `sorted_heap_graph_exact_catalog(...)` - both show route-local metadata, shared shard metadata, effective resolved metadata, optional shared/effective `segment_labels`, and per-column source markers (`route|shared|unset`) - this is introspection-only and does not change the routed GraphRAG execution contract - a first operator-facing route-profile catalog on top of that: - added `sorted_heap_graph_route_profile_catalog(...)` - it shows profile-local `policy_name`, inline `segment_groups`, policy-backed `segment_groups`, effective group order, `segment_groups_source` (`inline|policy|unset`), `relation_family`, `fanout_limit`, optional profile-level `segment_labels`, and whether the profile is the current route default - this is introspection-only and does not change the routed GraphRAG execution contract - a first route-level operator summary on top of that: - added `sorted_heap_graph_route_catalog(...)` - it shows one row per route with range-shard count, exact-binding count, policy/profile counts, and the effective default-profile contract, including default `segment_labels` - this is introspection-only and does not change the routed GraphRAG execution contract ### sorted_hnsw build optimization - Fixed a real build-time memory-safety bug in `src/hnsw_build.c`: reverse link insertion intentionally overflows a neighbor list by one entry before `shrink_connections()` prunes it, but the in-memory neighbor arrays were previously allocated to only `max_nbrs` slots. - The build now allocates `max_nbrs + 1` slots for those transient reverse inserts, removing the out-of-bounds write. - Local reproducer after the fix: - `40K` pairs / `200K` rows / `64D` / `m=16` / `ef_construction=64` - default contract `hop_weight=0.15`: `5/5` build-only passes - lowered-hop contract `hop_weight=0.05`: `3/3` build-only passes - Removed the per-search `visited[]` allocation/zeroing from the hot HNSW build loop in `src/hnsw_build.c` and replaced it with a reusable visit-mark array. - On the local `500K x 32D` diagnostic point (`m=8`, `ef_construction=8`), that reduced total `CREATE INDEX` time from about `18.1-18.7 s` to `2.8-3.0 s`, with the isolated graph-construction phase dropping from about `18.27 s` to `2.42-2.59 s`. - With that optimization in place, the AWS `10M x 32D` cheap-build scale run progressed from "stuck in CREATE INDEX" to the first real `10M` query pass. ### GraphRAG concurrent online-operation hardening - Added `scripts/test_graph_rag_concurrent.sh` and `make test-graphrag-concurrent` to verify registered alias-schema fact graphs under: - concurrent `INSERT` / `UPDATE` / `DELETE` - concurrent GraphRAG queries - concurrent `sorted_hnsw` KNN queries - `sorted_heap_compact_online(...)` - `sorted_heap_merge_online(...)` - The new harness verifies that: - GraphRAG alias mappings remain registered - the deterministic helper signature remains stable across online operations - the unified GraphRAG wrapper remains callable and non-empty - `sorted_hnsw` indexes stay valid and usable - backend-local GraphRAG observability still reports non-empty stage stats ### GraphRAG larger real-corpus verification - Refreshed the `0.13` GraphRAG plan and benchmark docs with a larger in-repo `cogniformerus` transfer gate. - Verified that the old tiny-budget code-corpus point (`top_k=4`) drifts on the full `183`-file `cogniformerus` repository to about `87%` keyword coverage and `66.7%` full hits. - Verified that increasing only the final result budget to `top_k=8` restores repeated-build stable `100.0% / 100.0%` on that larger in-repo Crystal corpus for both: - generic `prompt_summary_snippet_py` - code-aware `prompt_symbol_summary_snippet_py` - Added mixed-language code-corpus support to the benchmark harness via: - JSON question fixtures - configurable source extensions - quoted C/C++ include-edge extraction - Added the first real `~/Projects/C` adversary gate on `pycdc` using `scripts/fixtures/graph_rag_pycdc_questions.json`. - Verified on `pycdc` that: - the fast generic point is repeated-build stable but partial (`90.0% / 60.0%`) - the code-aware helper-backed compact include rescue is repeated-build stable at `100.0% / 100.0%` - Added the first archive-side adversary gate on `~/SrcArchives/apple/ninja/src` using `scripts/fixtures/graph_rag_ninja_questions.json`. - Verified on `ninja/src` that: - the plain generic `prompt_summary_snippet_py` path is repeated-build stable at `100.0% / 100.0%` once the final result budget is raised to `top_k=12` - the code-aware `prompt_summary_snippet_py` path remains partial (`85.0% / 80.0%`) on the same corpus - The scoped `0.13` larger real-corpus gate now spans: - `~/Projects/Crystal` - `~/Projects/C` - `~/SrcArchives` ## 0.12.0 (2026-03-26) ### Release documentation pass - Public docs now split the surface into: - **stable**: `sorted_heap` table AM and `sorted_hnsw` Index AM - **beta**: GraphRAG helper/wrapper API - **legacy/manual**: IVF-PQ and sidecar HNSW paths - README performance summary was narrowed to representative rows and had the stale narrow-range comparison removed. - README, `docs/index.md`, `docs/vector-search.md`, and `docs/limitations.md` now document the current `sorted_hnsw` ordered-scan contract: base-relation `ORDER BY embedding <=> query LIMIT k`, with explicit notes about `LIMIT`, `ef_search`, and filtered-query caveats. - `docs/api.md` now includes: - `sorted_hnsw.ef_search` - `sorted_hnsw.sq8` - stable `sorted_hnsw` usage examples - beta GraphRAG function reference and usage examples - `docs/benchmarks.md` now labels GraphRAG benchmark sections as beta-facing. ## 0.10.0 (2026-03-14) Documentation release: comprehensive rewrite of README with use-case examples, updated benchmarks, and usage guides. No SQL or C changes from 0.9.15. ## 0.9.15 (2026-03-13) ### Scan planner fixes - **Prepared-mode OLTP cliff fix**: Path B cost estimator now includes uncovered tail pages (pages beyond zone map entries created by UPDATEs). Previously the generic plan estimated 1 block but scanned 90+, keeping Custom Scan over Index Scan. Mixed OLTP: 56 → 28K tps. - **DML planning skip**: `set_rel_pathlist` hook bails out immediately for UPDATE/DELETE on the result relation, avoiding bounds extraction and range computation overhead. UPDATE +10%, DELETE+INSERT reaches heap parity. - **SCAN-1 regression test**: verifies prepared statement generic plan uses Index Scan (not Custom Scan) after UPDATEs create uncovered tail pages. ### UPDATE path optimization - **Remove `slot_getallattrs` from UPDATE/INSERT zone map path**: `zonemap_update_entry` only reads PK columns via `slot_getattr` (lazy deform). The prior `slot_getallattrs` call unnecessarily materialized all columns including wide svec vectors on every UPDATE/INSERT. Removing it eliminates ~530 bytes of needless deform per tuple for svec(128) tables. UPDATE vec col: 74% → 102% (parity). Mixed OLTP: 42% → 83%. - **Lazy update maintenance** (`sorted_heap.lazy_update = on`): opt-in mode that skips per-UPDATE zone map maintenance. First UPDATE on a covered page clears `SHM_FLAG_ZONEMAP_VALID` on disk; planner falls back to Index Scan. INSERT keeps eager maintenance. Compact/merge restores zone map pruning. UPDATE non-vec: 46% → 100%. Mixed OLTP: 83% → 97%. - **SCAN-2 regression test**: verifies lazy mode invalidation → Index Scan fallback → correct data → compact restores Custom Scan pruning. ### CRUD performance contract (500K rows, svec(128), prepared mode) | Operation | eager / heap | lazy / heap | Notes | |-----------|-------------|-------------|-------| | SELECT PK | 85% | 85% | Index Scan via btree | | SELECT range 1000 | 97% | — | Custom Scan pruning (eager only) | | Bulk INSERT | 100% | 100% | Always eager | | DELETE + INSERT | 63% | 63% | INSERT always eager | | UPDATE non-vec | 46% | **100%** | Lazy skips zone map flush | | UPDATE vec col | 102% | **100%** | Parity both modes | | Mixed OLTP | 83% | **97%** | Near-parity with lazy | Eager mode (default) maintains zone maps on every UPDATE for scan pruning. Lazy mode (`sorted_heap.lazy_update = on`) trades scan pruning for UPDATE parity with heap. Compact/merge restores pruning. Recommended for write-heavy workloads where point lookups use Index Scan anyway. ### HNSW sidecar search (`svec_hnsw_scan`) - **7-arg interface**: added `rerank1_topk` parameter for optional dense r1 pre-filter via `{prefix}_r1` sidecar table (nid int4 PK, rerank_vec hsvec). The 6-arg form continues to work via `PG_NARGS()`. - **Session-local L0 cache** (`sorted_heap.hnsw_cache_l0 = on`): seqscans L0 once per session (~95ms build, ~100MB for 103K nodes). Upper levels (L1–L4) cached separately (~6MB). OID-based invalidation on DDL. - **ARM NEON SIMD** for `cosine_distance_f32_f16`: vectorized mixed-precision distance with `vld1q_f16` → `vcvt_f32_f16` → `vfmaq_f32`, processing 8 f16 elements per iteration. Precomputed query self-norm (`cosine_distance_f32_f16_prenorm`) eliminates redundant norm_a accumulation in beam search (4 FMAs/iter vs 6). Compile-time guard: `__aarch64__ && __ARM_NEON && HSVEC_NATIVE_FP16`. Scalar fallback on x86. 27–36% speedup across all operating points. - **Beam search micro-optimizations** (22–29% additional speedup on cache paths): - Visited bitset: replaced `bool[]` (1 byte/node, 103KB) with `uint64[]` bitset (1 bit/node, 12.9KB). Fits in L1 cache; faster membership tests. - Neighbor prefetch: `__builtin_prefetch` on next unvisited neighbor's cache node before computing distance on current. Hides L2/L3 latency for the 908-byte cache nodes scattered across the 96MB cache array. - Cache-only upper search: `hnsw_search_cached()` bypasses `hnsw_open_level`/`hnsw_close_level` for warm upper-level caches. Eliminates `table_open` + `index_open` + `index_beginscan` overhead per level. Upper traversal: 0.17ms → 0.04ms (−74%). - **Recommended operating points** (103K x 2880-dim, hsvec(384) sketch): - Balanced: ef=96 rk=48 → 0.70ms p50, 96.8% recall@10 - Quality: ef=96 rk=0 → 1.15ms p50, 98.4% recall@10 - Latency: ef=64 rk=32 → 0.50ms p50, 92.8% recall@10 Measured with `shared_buffers=512MB` (pod 2Gi), isolated per-config protocol (warmup pass + measure pass, no cross-config TOAST sharing). pgvector HNSW under same conditions: 1.70ms p50 (ef=64). Cold first-call latency is 2–3× higher due to TOAST page faults; `shared_buffers` must be sized to hold the rerank working set (~27MB for 50 queries at rk=48). - **r1 verdict**: marginal on warm pools. At ef>=96 the btree overhead exceeds TOAST savings. Useful only in cold-TOAST scenarios. - **Tests**: ANN-15 (basic HNSW), ANN-15b (bit-identical distances across cache states), ANN-16 (r1 sidecar), ANN-17 (r1 absent graceful skip), ANN-18 (relcache invalidation). - **Adaptive ef** (`sorted_heap.hnsw_ef_patience`): patience-based early termination for L0 beam search. When set to N > 0, the search stops after N consecutive node expansions that don't improve the result set. `ef` becomes the maximum budget; easy queries converge sooner. Not beneficial for rk=0 (quality mode) where TOAST reads scale with ef. - **Sketch dimension sweep** (103K x 2880-dim Nomic, hsvec 384/512/768): recall@10 identical (98.2% at ef=96/rk=48, 93.6% at ef=64/rk=32) across all three dimensions; latency within noise (~1.2ms/q). First 384 dims via MRL prefix truncation already capture all discriminative power for this model. Recall ceiling is in navigation (graph quality / ef budget), not sketch fidelity. **384-dim is the correct sketch size.** - **Builder**: `build_hnsw_graph.py` now accepts `--sketch-dim` (infers from data by default). Make target: `make build-hnsw-bench-nomic`. ### Storage: persisted sorted prefix (S2) - **Meta page v7**: `shm_sorted_prefix_pages` persisted in meta page header. Split from `shm_padding` (struct size unchanged, backward compatible: v7 reader treats v6 as prefix=0). - **O(1) prefix detection**: `detect_sorted_prefix` returns persisted value when available, falls back to O(n) zone map scan for pre-v7 tables. - **Conservative shrink paths**: `tuple_update` override shrinks prefix when update lands on a prefix page. `zonemap_update_entry` detects both min-decrease and max-increase-with-overlap on prefix pages. - **Merge restore**: `rebuild_zonemap_internal` and merge early-exit both persist the recomputed prefix. - **Characterization** (`scripts/bench_s2_prefix.sql`): - Append-only: prefix survives 100% at all fillfactors - ff=100 + updates: 98% survival (non-HOT goes to tail) - ff<100 + updates/recycle: prefix collapses (accepted tradeoff) - Merge always restores - **Tests**: SH22-1 through SH22-5 covering compact, append, recycle insert, merge restore, and UPDATE-induced prefix shrink. ## 0.9.14 (2026-03-12) - `svec_hnsw_scan`: 6-arg hierarchical HNSW search via PG sidecar tables. Top-down descent through upper levels, beam search at L0 with hsvec sketches, exact rerank via main table TOAST vectors. ## 0.9.13 - `svec_graph_scan`: flat NSW graph search with btree-backed sidecar. - IVF-PQ three-stage rerank with sketch sidecar (`svec_ann_scan`).