--- layout: default title: Functional Gap Specs nav_order: 13 --- # Functional Gap Specs This document tracks the next functional gaps after the `0.14.0` release surface. It is intentionally contract-first: each gap should get a small spec, acceptance tests, and only then implementation. ## Gap Inventory ### G0. Composite-PK pruning quality Current state: - The zone map stores the first two PK columns. - Initial direct leaf probe on a compacted `(tenant_id, id)` sorted_heap table produced `Custom Scan (SortedHeapScan)`, but scanned `308 of 310` blocks for `tenant_id = 1 AND id BETWEEN 100 AND 110`. - The zone map entries contained second-column ranges, so this is not a storage-layout absence. The likely gap is in the sorted-prefix range search: it uses the first column as the main binary-search key and does not yet make the lexicographic `(col1, col2)` bound tight enough. - Implemented first-pass fix: when column 1 is fixed by inclusive equality and column 2 has a range/equality bound, the sorted-prefix candidate span is refined with existing second-column zone-map overlap checks. - Regression `SH21B` now guards `tenant_id = 1 AND id BETWEEN 100 AND 110` with a tight scanned-block threshold and verifies count correctness. - Regression `SH21B-3` now guards that an unsorted tail remains conservative for fixed-column-1 and bounded-column-2 predicates, preserving correctness after tail inserts. - Regression `SH21B` now has a dedicated first-column-only range guard, so the first-pass composite refinement does not regress the original col1 pruning path. - Regression `SH21B-3` now guards generic prepared runtime composite bounds under `force_generic_plan`, so executor-startup parameter resolution keeps the same tight pruning shape as constant `(tenant_id, id)` predicates. - `docs/spec-composite-pk-pruning.md` documents the first-pass contract and remaining lexicographic follow-ups. Risk: - Partitioning and fact-shaped GraphRAG commonly use composite keys such as `(tenant_id, id)` or `(entity_id, relation_id, target_id)`. If second-column pruning is weak, we keep correctness but give back much of the locality win. Target direction: - Continue from the first-pass refinement toward true lexicographic binary search for the sorted prefix path only if planning-time refinement over a large tenant span becomes measurable overhead and the zone-map metadata can prove lexicographic endpoint ordering. - Keep linear/tail pruning conservative, but make compacted prefix queries with equality on column 1 and range/equality on column 2 map to tight block ranges. ### G1. Declarative partitioning support Current state: - `sorted_heap` works as a concrete table access method on physical tables. - Each physical `sorted_heap` relation owns its own block-0 meta page, zone map, sorted-prefix state, compact/merge lifecycle, and optional `sorted_hnsw` indexes. - PostgreSQL declarative partitioned parents now have first-pass explicit status and maintenance helpers: `sorted_heap_partition_status(...)`, `sorted_heap_compact_partitions(...)`, `sorted_heap_merge_partitions(...)`, and `sorted_heap_rebuild_zonemap_partitions(...)`. - Verified probe: sorted_heap leaves under a declarative partitioned parent can be created and compacted, and a direct leaf query can use `SortedHeapScan`. - First-pass planner fix: the same covered predicate shape issued through the partitioned parent now reaches `SortedHeapScan` on the pruned leaf. - Regression `SH23` covers parent status, leaf-by-leaf compact, fail-closed unsupported heap/no-PK leaf behavior, explicit skip mode, empty partition parents, nested partition trees, and parent-query `SortedHeapScan` for single-leaf and multi-leaf range-partition shapes, including a generic prepared runtime-bound parent query. - GraphRAG routing already supports multiple concrete shard relations, but it does not automatically dispatch across a PostgreSQL partitioned-table parent. Risk: - Treating partitioning as one broad feature would mix storage maintenance, planner pruning, index fanout, and GraphRAG global merge semantics. Target direction: - Make partition support explicit and incremental: 1. leaf tables first: done; 2. parent-level maintenance helpers: first pass done; 3. parent/leaf observability: first pass done; 4. parent-query `SortedHeapScan` planner support: covered single-leaf and multi-leaf range shapes plus generic prepared runtime-bound shape done; 5. optional GraphRAG/ANN parent dispatch later. ### G2. Huge-table compaction operating model Current state: - `sorted_heap_compact(regclass)` is a rewrite-and-swap operation similar in operational shape to `CLUSTER` or `VACUUM FULL`. - `sorted_heap_merge(regclass)` avoids re-sorting an already sorted prefix, but still rewrites into a new relation. - Online compact/merge reduce blocking time, not temporary disk-space requirements. - `docs/spec-huge-table-compaction.md` now documents the first-pass operating model: concrete operations rewrite one relation; partition helpers bound the rewrite unit to one leaf. Risk: - A user may expect compaction to be an in-place defragmentation operation with no second-copy space requirement. That is not the current contract. Target direction: - For very large logical tables, keep partition-scoped compaction as the default operational story. - Longer term: explore segment-level compaction inside a relation, but do not promise this until crash-safety and index semantics are specified. ### G3. Filtered ANN with `sorted_hnsw` Current state: - `sorted_hnsw` is intentionally narrow: base-relation `ORDER BY embedding <=> query LIMIT k`. - It avoids planning for unbounded KNN and for extra base-table quals that can under-return candidates. - GraphRAG wrappers handle some filtered retrieval patterns outside the generic Index AM path. - `docs/spec-filtered-ann.md` now separates filtered retrieval into pre-filter/materialize, ANN-overfetch/exact-rerank, and partition/routing-aware global-merge modes. - `sorted_hnsw` regression now has explicit planner-safety guards for unbounded KNN, `LIMIT > sorted_hnsw.ef_search`, and filtered KNN shapes. - `sorted_hnsw_partition_search_status(...)` now exposes underfill metadata for routed partition search without changing the row-returning API. - `sorted_hnsw_partition_search(..., exact_fallback := true)` now lets callers explicitly replace an underfilled selected-leaf ANN pool with exact rerank over the same selected leaves. - `sorted_hnsw_partition_search(...)` now uses a C SRF over the leaf-local `sorted_hnsw` Index AM rather than PL/pgSQL dynamic fanout, preserving the public SQL contract while removing selected-leaf orchestration overhead. Risk: - Users will expect pgvector-like filtered ANN behavior from a normal-looking index. Target direction: - Keep the transparent `sorted_hnsw` planner guard intact until helper-level filtered contracts prove underfill handling and global merge semantics. - Implement helper-level modes from `docs/spec-filtered-ann.md` before promoting any filtered ANN path into the planner. - Treat exact fallback as an explicit selected-leaf helper mode, not a generic `WHERE`-qual planner mode. - Keep transparent filtered ANN planner support gated behind explicit underfill/global-merge semantics. The C partition helper closes the route-first helper overhead gap, but it does not by itself make arbitrary `WHERE`-qualified KNN safe. ### G4. Parent-level observability Current state: - `sorted_heap_zonemap_stats(regclass)` reports one concrete relation. - `sorted_heap_partition_status(parent)` is the first row-returning observability surface. It accepts a partitioned parent or a concrete sorted_heap table and reports leaf AM, PK presence, zone-map validity, sorted-prefix pages, zone-map entries, overflow pages, and relation size. - `sorted_heap_partition_index_status(parent)` now reports leaf index health: index AM, valid/ready/live flags, primary/unique flags, and convenience booleans for btree and `sorted_hnsw`. - `sorted_heap_scan_stats_by_relation()` now reports backend-local relation-aware scan counters for same-backend diagnostics, and shared relation-aware counters when the extension is preloaded. - `sorted_heap_partition_scan_stats(parent)` now rolls those counters up to sorted_heap leaves under a partitioned parent or concrete table. - Aggregate scan stats are global/per-backend counters, not per partition. - GraphRAG aggregate stats are backend-local last-call stats. - Routed/segmented GraphRAG now also exposes backend-local per-shard last-call stats with `source_rel` identity. Risk: - On partitioned deployments, `SortedHeapScan` counters now have parent leaf rollups and routed GraphRAG calls have backend-local `source_rel` execution rows. Broader persistent telemetry is still deliberately out of scope. Target direction: - Treat `sorted_heap_partition_status(...)` as the storage-state baseline. - Treat `sorted_heap_partition_index_status(...)` as the index-health baseline. - Add broader row-returning parent observability only when it can reuse relation-aware scan stats or `sorted_heap_graph_route_last_stats()` without relabeling backend-local diagnostics as persistent telemetry. See `docs/spec-parent-runtime-observability.md`. ### G5. Online compact/merge restrictions for lossy PKs Current state: - Online compact/merge rejects UUID/text/varchar PKs because the replay key is currently lossy `int64`. - SH13 regression covers the fail-closed online compact/merge behavior for UUID, text, and varchar PKs, including the concrete operation/type rejection message. - `docs/spec-online-lossy-pk.md` documents the required future design: pruning keys may stay lossy, but online replay identity must be lossless. Risk: - The offline path works, so the online limitation may surprise users. Target direction: - Replace the replay map key with a lossless Datum/composite key representation or a serialized binary key, with full-key equality after hash lookup. - Keep current fail-closed behavior until then. ### G6. `pg_dump` / `pg_restore` zone-map ergonomics Current state: - Data restores correctly. - Zone-map pruning needs compact/merge after restore. - Existing docs now call this out for concrete relations and partitioned parents; HNSW sidecars still need rebuild because restored heap TIDs change. - `sorted_heap_restore_plan(parent default NULL)` now reports concrete sorted_heap tables/leaves that need compact/merge after restore and flags `sorted_hnsw` indexes that should be rebuilt after `pg_restore`. Risk: - Users may see correct but slower queries after restore and not understand why. Target direction: - Keep the explicit restore checklist in operator docs. - Keep `sorted_heap_restore_plan(...)` read-only; correctness does not depend on it, but it reduces post-restore operator guesswork. ### G7. Zone-map-only / index-only-like fast paths Current state: - `SortedHeapScan` uses zone maps to choose heap block ranges, then returns heap tuples via the normal table scan path. - Zone maps store page-level min/max for the first two PK columns. They do not store tuple values, TIDs, or MVCC visibility information. - `docs/spec-zone-map-only-fast-paths.md` now defines the boundary: current zone maps can skip pages or prove empty overlap, but cannot implement a true PostgreSQL `Index Only Scan`. Risk: - Calling this "index-only" overstates the feature and invites an unsafe implementation that bypasses heap visibility or returns rows from lossy page metadata. Target direction: - Keep `0.13` as heap-backed `SortedHeapScan`. - If needed later, choose explicitly between metadata-only fast paths and a covering value-bearing sidecar / Index AM contract. ### G8. Large-vector sublinear search revival Current state: - `sorted_hnsw` is the stable planner-integrated vector default. - IVF-PQ remains available as a legacy/manual `svec_ann_scan(...)` path. - FlashHadamard is a documented experimental packed exhaustive lane. - Existing FlashHadamard notes refute IVF/pruning as the next default win at `103K x 2880D`, but do not refute sublinear routing at `500K+` / `1M+`. - `docs/spec-large-vector-sublinear.md` now defines the benchmark gate for any IVF-PQ, SQ8, or SQ4 revival. - `make bench-large-vector-synthetic` now provides the reproducible synthetic baseline entry point. Its defaults are smoke-sized; large-scale runs override `BENCH_LARGE_VECTOR_ROWS`, `BENCH_LARGE_VECTOR_QUERIES`, and `BENCH_LARGE_VECTOR_DIM`. - `scripts/bench_ann_real_dataset.py --enable-ivfpq` adds an opt-in residual IVF-PQ row for real-corpus runs. It remains off by default because training, generated-code insertion, and compaction are part of the measured cost. - `scripts/bench_ann_real_dataset.py --enable-flashhadamard` adds an opt-in offline FlashHadamard packed exhaustive row on the same vectors, queries, and exact PostgreSQL ground truth. It is a comparator, not PostgreSQL storage lifecycle integration. - The same harness accepts local `.npy` / `.npz` vector inputs so smoke tests can exercise the full matrix without downloading external ANN-Benchmarks files. - `make bench-ann-matrix-offline-smoke` wraps that local-input path and checks exact heap, `sorted_hnsw`, pgvector HNSW, residual IVF-PQ, and FlashHadamard packed exhaustive in one deterministic smoke run. Risk: - Promoting IVF-PQ or SQ4 without a scale gate would weaken the product story: it would add complexity while competing with stronger current defaults on the validated `103K` operating point. Target direction: - Keep IVF-PQ as explicit/manual until a large-scale harness proves a winning region against `sorted_hnsw` and FlashHadamard. - Use the synthetic harness first to establish exact/sorted_hnsw/pgvector and optional DiskANN baselines before adding IVF-PQ or FlashHadamard rows. - Use the real-dataset `--enable-ivfpq` row to gate any claim that residual IVF-PQ has a winning region; do not infer it from synthetic data alone. - Use the real-dataset `--enable-flashhadamard` row as the packed exhaustive quality/footprint comparator before promoting any sublinear lane. - Prefer SQ8 before SQ4 for productized candidate scanning unless a 4-bit FlashHadamard-derived path proves equal quality at lower footprint. ### G9. SIMD ADC and pgvectorscale DiskANN comparison Current state: - PQ ADC has a scalar lookup path and a C-level `svec_ann_scan(...)` path that avoids per-row fmgr overhead. - FlashHadamard has an experimental NEON int16 path and a kernel lab with AVX2 candidates; existing notes refute AVX2 int16 and keep AVX2 gather as microbench-only. - pgvectorscale StreamingDiskANN is an external graph-index baseline with SBQ, query rescoring, label filtering, and relaxed-order behavior. - `docs/spec-simd-adc-diskann.md` now separates local ADC kernel optimization from product-level DiskANN benchmarking. - `scripts/bench_sorted_hnsw_vs_pgvector.sh` now reports strict-order mode and optional pgvectorscale DiskANN rows. If `vectorscale` is absent, it emits a skip note and keeps the local benchmark runnable. Risk: - A microbench-only SIMD win can disappear in PostgreSQL executor overhead. - A DiskANN comparison can be misleading if it omits strict-order behavior, query rescoring, build settings, or footprint. Target direction: - Use the optional pgvectorscale harness row for apples-to-apples DiskANN comparisons when the extension is installed. - Only integrate SIMD kernels behind scalar parity gates and platform guards. ## Prioritization | Priority | Gap | Reason | |----------|-----|--------| | P0 | G0 composite-PK pruning quality | First-pass fixed-col1/bounded-col2 path landed; const, generic runtime, first-col-only, and tail correctness guarded; full lexicographic search remains benchmark-gated follow-up | | P0 | G1 declarative partitioning support | Directly affects huge-table operating model and interview/product story | | P0 | G2 huge-table compaction model | Needed to explain free-space requirements honestly | | P1 | G4 parent-level observability | Storage, index-health, local/shared scan rollups, and GraphRAG source-rel route stats landed; broader persistent telemetry remains out of scope | | P1 | G3 filtered ANN | Expected by vector-search users, but broader than storage | | P2 | G7 zone-map-only / index-only-like fast paths | Metadata-only empty-result probe landed; real row-returning path requires a covering sidecar | | P2 | G10 append-zone run metadata | Design guardrail landed; safe implementation requires row-order certification | | P2 | G8 large-vector sublinear search revival | Benchmark-gated; do not reopen refuted 103K pruning as a default | | P2 | G9 SIMD ADC and pgvectorscale DiskANN comparison | Benchmark-gated; external baseline needs versioned settings and strict-order note | | P2 | G5 online lossy-PK support | Useful, but current fail-closed behavior is acceptable | | P2 | G6 restore ergonomics | Checklist and read-only restore plan helper landed | ## Quadrumvirate Notes Cassandra: - Likely failure mode: collapsing partition support, GraphRAG routing, and ANN fanout into one over-broad implementation. - Likely failure mode: making parent functions silently skip unsupported leaves, hiding operational drift. - Likely failure mode: implementing parent maintenance while leaving composite-key pruning weak, so partitioned tenant tables work operationally but not fast enough. Daedalus: - Reframe from "support partitioned tables" to "define exact contracts for leaf storage, parent maintenance, and cross-leaf query merge". Maieutic: - Refuted assumption: PostgreSQL planner will automatically produce the same `SortedHeapScan` path through a partitioned parent that it produces on the leaf directly. Probe result: direct leaf query used `SortedHeapScan`; parent query did not. - Assumption: parent-level compact can be operationally useful even if it locks one leaf at a time. Falsifier: lock behavior or transaction semantics require a different procedure-style API. - Refuted then partially fixed assumption: current two-column zone map gives tight pruning for `(tenant_id, id)` compacted tables. Probe result before the fix: `id BETWEEN 100 AND 110` under `tenant_id = 1` scanned most leaf blocks. `SH21B` now guards the fixed-col1/bounded-col2 path. Adversary: - Parent APIs must not accidentally operate on foreign partitions or non-sorted_heap leaves without an explicit policy. - Parent APIs must report partial failure clearly. - Cross-leaf ANN/GraphRAG needs global top-k exact merge; local top-k concat is not a sufficient correctness contract.