---
layout: default
title: Functional Gap Specs
nav_order: 13
---

# Functional Gap Specs

This document tracks the next functional gaps after the `0.14.0` release
surface. It is intentionally contract-first: each gap should get a small spec,
acceptance tests, and only then implementation.

## Gap Inventory

### G0. Composite-PK pruning quality

Current state:

- The zone map stores the first two PK columns.
- Initial direct leaf probe on a compacted `(tenant_id, id)` sorted_heap table
  produced `Custom Scan (SortedHeapScan)`, but scanned `308 of 310` blocks for
  `tenant_id = 1 AND id BETWEEN 100 AND 110`.
- The zone map entries contained second-column ranges, so this is not a
  storage-layout absence. The likely gap is in the sorted-prefix range search:
  it uses the first column as the main binary-search key and does not yet make
  the lexicographic `(col1, col2)` bound tight enough.
- Implemented first-pass fix: when column 1 is fixed by inclusive equality and
  column 2 has a range/equality bound, the sorted-prefix candidate span is
  refined with existing second-column zone-map overlap checks.
- Regression `SH21B` now guards `tenant_id = 1 AND id BETWEEN 100 AND 110`
  with a tight scanned-block threshold and verifies count correctness.
- Regression `SH21B-3` now guards that an unsorted tail remains conservative
  for fixed-column-1 and bounded-column-2 predicates, preserving correctness
  after tail inserts.
- Regression `SH21B` now has a dedicated first-column-only range guard, so the
  first-pass composite refinement does not regress the original col1 pruning
  path.
- Regression `SH21B-3` now guards generic prepared runtime composite bounds
  under `force_generic_plan`, so executor-startup parameter resolution keeps the
  same tight pruning shape as constant `(tenant_id, id)` predicates.
- `docs/spec-composite-pk-pruning.md` documents the first-pass contract and
  remaining lexicographic follow-ups.

Risk:

- Partitioning and fact-shaped GraphRAG commonly use composite keys such as
  `(tenant_id, id)` or `(entity_id, relation_id, target_id)`. If second-column
  pruning is weak, we keep correctness but give back much of the locality win.

Target direction:

- Continue from the first-pass refinement toward true lexicographic binary
  search for the sorted prefix path only if planning-time refinement over a
  large tenant span becomes measurable overhead and the zone-map metadata can
  prove lexicographic endpoint ordering.
- Keep linear/tail pruning conservative, but make compacted prefix queries with
  equality on column 1 and range/equality on column 2 map to tight block ranges.

### G1. Declarative partitioning support

Current state:

- `sorted_heap` works as a concrete table access method on physical tables.
- Each physical `sorted_heap` relation owns its own block-0 meta page, zone
  map, sorted-prefix state, compact/merge lifecycle, and optional
  `sorted_hnsw` indexes.
- PostgreSQL declarative partitioned parents now have first-pass explicit
  status and maintenance helpers:
  `sorted_heap_partition_status(...)`,
  `sorted_heap_compact_partitions(...)`,
  `sorted_heap_merge_partitions(...)`, and
  `sorted_heap_rebuild_zonemap_partitions(...)`.
- Verified probe: sorted_heap leaves under a declarative partitioned parent can
  be created and compacted, and a direct leaf query can use
  `SortedHeapScan`.
- First-pass planner fix: the same covered predicate shape issued through the
  partitioned parent now reaches `SortedHeapScan` on the pruned leaf.
- Regression `SH23` covers parent status, leaf-by-leaf compact, fail-closed
  unsupported heap/no-PK leaf behavior, explicit skip mode, empty partition
  parents, nested partition trees, and parent-query `SortedHeapScan` for
  single-leaf and multi-leaf range-partition shapes, including a generic
  prepared runtime-bound parent query.
- GraphRAG routing already supports multiple concrete shard relations, but it
  does not automatically dispatch across a PostgreSQL partitioned-table parent.

Risk:

- Treating partitioning as one broad feature would mix storage maintenance,
  planner pruning, index fanout, and GraphRAG global merge semantics.

Target direction:

- Make partition support explicit and incremental:
  1. leaf tables first: done;
  2. parent-level maintenance helpers: first pass done;
  3. parent/leaf observability: first pass done;
  4. parent-query `SortedHeapScan` planner support: covered single-leaf and
     multi-leaf range shapes plus generic prepared runtime-bound shape done;
  5. optional GraphRAG/ANN parent dispatch later.

### G2. Huge-table compaction operating model

Current state:

- `sorted_heap_compact(regclass)` is a rewrite-and-swap operation similar in
  operational shape to `CLUSTER` or `VACUUM FULL`.
- `sorted_heap_merge(regclass)` avoids re-sorting an already sorted prefix, but
  still rewrites into a new relation.
- Online compact/merge reduce blocking time, not temporary disk-space
  requirements.
- `docs/spec-huge-table-compaction.md` now documents the first-pass operating
  model: concrete operations rewrite one relation; partition helpers bound the
  rewrite unit to one leaf.

Risk:

- A user may expect compaction to be an in-place defragmentation operation with
  no second-copy space requirement. That is not the current contract.

Target direction:

- For very large logical tables, keep partition-scoped compaction as the
  default operational story.
- Longer term: explore segment-level compaction inside a relation, but do not
  promise this until crash-safety and index semantics are specified.

### G3. Filtered ANN with `sorted_hnsw`

Current state:

- `sorted_hnsw` is intentionally narrow: base-relation
  `ORDER BY embedding <=> query LIMIT k`.
- It avoids planning for unbounded KNN and for extra base-table quals that can
  under-return candidates.
- GraphRAG wrappers handle some filtered retrieval patterns outside the generic
  Index AM path.
- `docs/spec-filtered-ann.md` now separates filtered retrieval into
  pre-filter/materialize, ANN-overfetch/exact-rerank, and
  partition/routing-aware global-merge modes.
- `sorted_hnsw` regression now has explicit planner-safety guards for
  unbounded KNN, `LIMIT > sorted_hnsw.ef_search`, and filtered KNN shapes.
- `sorted_hnsw_partition_search_status(...)` now exposes underfill metadata for
  routed partition search without changing the row-returning API.
- `sorted_hnsw_partition_search(..., exact_fallback := true)` now lets callers
  explicitly replace an underfilled selected-leaf ANN pool with exact rerank
  over the same selected leaves.
- `sorted_hnsw_partition_search(...)` now uses a C SRF over the leaf-local
  `sorted_hnsw` Index AM rather than PL/pgSQL dynamic fanout, preserving the
  public SQL contract while removing selected-leaf orchestration overhead.

Risk:

- Users will expect pgvector-like filtered ANN behavior from a normal-looking
  index.

Target direction:

- Keep the transparent `sorted_hnsw` planner guard intact until helper-level
  filtered contracts prove underfill handling and global merge semantics.
- Implement helper-level modes from `docs/spec-filtered-ann.md` before
  promoting any filtered ANN path into the planner.
- Treat exact fallback as an explicit selected-leaf helper mode, not a generic
  `WHERE`-qual planner mode.
- Keep transparent filtered ANN planner support gated behind explicit
  underfill/global-merge semantics. The C partition helper closes the
  route-first helper overhead gap, but it does not by itself make arbitrary
  `WHERE`-qualified KNN safe.

### G4. Parent-level observability

Current state:

- `sorted_heap_zonemap_stats(regclass)` reports one concrete relation.
- `sorted_heap_partition_status(parent)` is the first row-returning
  observability surface. It accepts a partitioned parent or a concrete
  sorted_heap table and reports leaf AM, PK presence, zone-map validity,
  sorted-prefix pages, zone-map entries, overflow pages, and relation size.
- `sorted_heap_partition_index_status(parent)` now reports leaf index health:
  index AM, valid/ready/live flags, primary/unique flags, and convenience
  booleans for btree and `sorted_hnsw`.
- `sorted_heap_scan_stats_by_relation()` now reports backend-local
  relation-aware scan counters for same-backend diagnostics, and shared
  relation-aware counters when the extension is preloaded.
- `sorted_heap_partition_scan_stats(parent)` now rolls those counters up to
  sorted_heap leaves under a partitioned parent or concrete table.
- Aggregate scan stats are global/per-backend counters, not per partition.
- GraphRAG aggregate stats are backend-local last-call stats.
- Routed/segmented GraphRAG now also exposes backend-local per-shard last-call
  stats with `source_rel` identity.

Risk:

- On partitioned deployments, `SortedHeapScan` counters now have parent leaf
  rollups and routed GraphRAG calls have backend-local `source_rel` execution
  rows. Broader persistent telemetry is still deliberately out of scope.

Target direction:

- Treat `sorted_heap_partition_status(...)` as the storage-state baseline.
- Treat `sorted_heap_partition_index_status(...)` as the index-health baseline.
- Add broader row-returning parent observability only when it can reuse
  relation-aware scan stats or `sorted_heap_graph_route_last_stats()` without
  relabeling backend-local diagnostics as persistent telemetry. See
  `docs/spec-parent-runtime-observability.md`.

### G5. Online compact/merge restrictions for lossy PKs

Current state:

- Online compact/merge rejects UUID/text/varchar PKs because the replay key is
  currently lossy `int64`.
- SH13 regression covers the fail-closed online compact/merge behavior for
  UUID, text, and varchar PKs, including the concrete operation/type rejection
  message.
- `docs/spec-online-lossy-pk.md` documents the required future design:
  pruning keys may stay lossy, but online replay identity must be lossless.

Risk:

- The offline path works, so the online limitation may surprise users.

Target direction:

- Replace the replay map key with a lossless Datum/composite key representation
  or a serialized binary key, with full-key equality after hash lookup.
- Keep current fail-closed behavior until then.

### G6. `pg_dump` / `pg_restore` zone-map ergonomics

Current state:

- Data restores correctly.
- Zone-map pruning needs compact/merge after restore.
- Existing docs now call this out for concrete relations and partitioned
  parents; HNSW sidecars still need rebuild because restored heap TIDs change.
- `sorted_heap_restore_plan(parent default NULL)` now reports concrete
  sorted_heap tables/leaves that need compact/merge after restore and flags
  `sorted_hnsw` indexes that should be rebuilt after `pg_restore`.

Risk:

- Users may see correct but slower queries after restore and not understand why.

Target direction:

- Keep the explicit restore checklist in operator docs.
- Keep `sorted_heap_restore_plan(...)` read-only; correctness does not depend
  on it, but it reduces post-restore operator guesswork.

### G7. Zone-map-only / index-only-like fast paths

Current state:

- `SortedHeapScan` uses zone maps to choose heap block ranges, then returns
  heap tuples via the normal table scan path.
- Zone maps store page-level min/max for the first two PK columns. They do not
  store tuple values, TIDs, or MVCC visibility information.
- `docs/spec-zone-map-only-fast-paths.md` now defines the boundary: current
  zone maps can skip pages or prove empty overlap, but cannot implement a true
  PostgreSQL `Index Only Scan`.

Risk:

- Calling this "index-only" overstates the feature and invites an unsafe
  implementation that bypasses heap visibility or returns rows from lossy page
  metadata.

Target direction:

- Keep `0.13` as heap-backed `SortedHeapScan`.
- If needed later, choose explicitly between metadata-only fast paths and a
  covering value-bearing sidecar / Index AM contract.

### G8. Large-vector sublinear search revival

Current state:

- `sorted_hnsw` is the stable planner-integrated vector default.
- IVF-PQ remains available as a legacy/manual `svec_ann_scan(...)` path.
- FlashHadamard is a documented experimental packed exhaustive lane.
- Existing FlashHadamard notes refute IVF/pruning as the next default win at
  `103K x 2880D`, but do not refute sublinear routing at `500K+` / `1M+`.
- `docs/spec-large-vector-sublinear.md` now defines the benchmark gate for any
  IVF-PQ, SQ8, or SQ4 revival.
- `make bench-large-vector-synthetic` now provides the reproducible synthetic
  baseline entry point. Its defaults are smoke-sized; large-scale runs override
  `BENCH_LARGE_VECTOR_ROWS`, `BENCH_LARGE_VECTOR_QUERIES`, and
  `BENCH_LARGE_VECTOR_DIM`.
- `scripts/bench_ann_real_dataset.py --enable-ivfpq` adds an opt-in residual
  IVF-PQ row for real-corpus runs. It remains off by default because training,
  generated-code insertion, and compaction are part of the measured cost.
- `scripts/bench_ann_real_dataset.py --enable-flashhadamard` adds an opt-in
  offline FlashHadamard packed exhaustive row on the same vectors, queries, and
  exact PostgreSQL ground truth. It is a comparator, not PostgreSQL storage
  lifecycle integration.
- The same harness accepts local `.npy` / `.npz` vector inputs so smoke tests
  can exercise the full matrix without downloading external ANN-Benchmarks
  files.
- `make bench-ann-matrix-offline-smoke` wraps that local-input path and checks
  exact heap, `sorted_hnsw`, pgvector HNSW, residual IVF-PQ, and FlashHadamard
  packed exhaustive in one deterministic smoke run.

Risk:

- Promoting IVF-PQ or SQ4 without a scale gate would weaken the product story:
  it would add complexity while competing with stronger current defaults on the
  validated `103K` operating point.

Target direction:

- Keep IVF-PQ as explicit/manual until a large-scale harness proves a winning
  region against `sorted_hnsw` and FlashHadamard.
- Use the synthetic harness first to establish exact/sorted_hnsw/pgvector and
  optional DiskANN baselines before adding IVF-PQ or FlashHadamard rows.
- Use the real-dataset `--enable-ivfpq` row to gate any claim that residual
  IVF-PQ has a winning region; do not infer it from synthetic data alone.
- Use the real-dataset `--enable-flashhadamard` row as the packed exhaustive
  quality/footprint comparator before promoting any sublinear lane.
- Prefer SQ8 before SQ4 for productized candidate scanning unless a 4-bit
  FlashHadamard-derived path proves equal quality at lower footprint.

### G9. SIMD ADC and pgvectorscale DiskANN comparison

Current state:

- PQ ADC has a scalar lookup path and a C-level `svec_ann_scan(...)` path that
  avoids per-row fmgr overhead.
- FlashHadamard has an experimental NEON int16 path and a kernel lab with AVX2
  candidates; existing notes refute AVX2 int16 and keep AVX2 gather as
  microbench-only.
- pgvectorscale StreamingDiskANN is an external graph-index baseline with SBQ,
  query rescoring, label filtering, and relaxed-order behavior.
- `docs/spec-simd-adc-diskann.md` now separates local ADC kernel optimization
  from product-level DiskANN benchmarking.
- `scripts/bench_sorted_hnsw_vs_pgvector.sh` now reports strict-order mode and
  optional pgvectorscale DiskANN rows. If `vectorscale` is absent, it emits a
  skip note and keeps the local benchmark runnable.

Risk:

- A microbench-only SIMD win can disappear in PostgreSQL executor overhead.
- A DiskANN comparison can be misleading if it omits strict-order behavior,
  query rescoring, build settings, or footprint.

Target direction:

- Use the optional pgvectorscale harness row for apples-to-apples DiskANN
  comparisons when the extension is installed.
- Only integrate SIMD kernels behind scalar parity gates and platform guards.

## Prioritization

| Priority | Gap | Reason |
|----------|-----|--------|
| P0 | G0 composite-PK pruning quality | First-pass fixed-col1/bounded-col2 path landed; const, generic runtime, first-col-only, and tail correctness guarded; full lexicographic search remains benchmark-gated follow-up |
| P0 | G1 declarative partitioning support | Directly affects huge-table operating model and interview/product story |
| P0 | G2 huge-table compaction model | Needed to explain free-space requirements honestly |
| P1 | G4 parent-level observability | Storage, index-health, local/shared scan rollups, and GraphRAG source-rel route stats landed; broader persistent telemetry remains out of scope |
| P1 | G3 filtered ANN | Expected by vector-search users, but broader than storage |
| P2 | G7 zone-map-only / index-only-like fast paths | Metadata-only empty-result probe landed; real row-returning path requires a covering sidecar |
| P2 | G10 append-zone run metadata | Design guardrail landed; safe implementation requires row-order certification |
| P2 | G8 large-vector sublinear search revival | Benchmark-gated; do not reopen refuted 103K pruning as a default |
| P2 | G9 SIMD ADC and pgvectorscale DiskANN comparison | Benchmark-gated; external baseline needs versioned settings and strict-order note |
| P2 | G5 online lossy-PK support | Useful, but current fail-closed behavior is acceptable |
| P2 | G6 restore ergonomics | Checklist and read-only restore plan helper landed |

## Quadrumvirate Notes

Cassandra:

- Likely failure mode: collapsing partition support, GraphRAG routing, and ANN
  fanout into one over-broad implementation.
- Likely failure mode: making parent functions silently skip unsupported leaves,
  hiding operational drift.
- Likely failure mode: implementing parent maintenance while leaving
  composite-key pruning weak, so partitioned tenant tables work operationally
  but not fast enough.

Daedalus:

- Reframe from "support partitioned tables" to "define exact contracts for
  leaf storage, parent maintenance, and cross-leaf query merge".

Maieutic:

- Refuted assumption: PostgreSQL planner will automatically produce the same
  `SortedHeapScan` path through a partitioned parent that it produces on the
  leaf directly. Probe result: direct leaf query used `SortedHeapScan`; parent
  query did not.
- Assumption: parent-level compact can be operationally useful even if it
  locks one leaf at a time. Falsifier: lock behavior or transaction semantics
  require a different procedure-style API.
- Refuted then partially fixed assumption: current two-column zone map gives
  tight pruning for `(tenant_id, id)` compacted tables. Probe result before the
  fix: `id BETWEEN 100 AND 110` under `tenant_id = 1` scanned most leaf blocks.
  `SH21B` now guards the fixed-col1/bounded-col2 path.

Adversary:

- Parent APIs must not accidentally operate on foreign partitions or
  non-sorted_heap leaves without an explicit policy.
- Parent APIs must report partial failure clearly.
- Cross-leaf ANN/GraphRAG needs global top-k exact merge; local top-k concat is
  not a sufficient correctness contract.