---
layout: default
title: SIMD ADC And DiskANN Comparison
nav_order: 21
---

# Spec: SIMD ADC And pgvectorscale DiskANN Comparison

Status: proposed
Risk tier: CAUTION
Primary goal: separate local SIMD ADC optimization from an apples-to-apples
comparison against pgvectorscale StreamingDiskANN.

## Problem

The TODO shorthand combines two different tracks:

- **SIMD ADC lookup:** make our compressed-vector scoring kernels faster.
- **pgvectorscale DiskANN comparison:** benchmark against a graph index with
  PostgreSQL integration, SBQ compression, rescoring, and filtered-search
  features.

Treating them as one task would blur algorithmic quality, storage layout,
execution model, and PostgreSQL integration overhead.

## Current Local Evidence

### PQ ADC

`svec_pq_adc_lookup(dist_table, code)` currently performs scalar lookup and
accumulation:

```text
for each subvector m:
    total += dist_table[m][code[m]]
```

This is simple and portable, but it is still scalar per candidate. The
C-level `svec_ann_scan(...)` path avoids per-row fmgr overhead and should be
the baseline before micro-optimizing the standalone SQL ADC function.

### FlashHadamard ADC

FlashHadamard already has a packed byte-table scorer and a separate CPU kernel
lab:

- Apple/NEON int16 LUT is integrated behind `FH_INT16=1` and showed a narrow
  end-to-end win on the validated local path.
- Intel/AVX2 int16 LUT was refuted in the existing notes.
- Intel/AVX2 float gather is promising in the standalone kernel lab but is not
  integrated into the engine.

The safe conclusion is hardware-specific: SIMD ADC optimization is worthwhile,
but each kernel needs a platform-specific parity and latency gate.

## Current pgvectorscale Baseline

As of the upstream README checked on 2026-05-06, pgvectorscale provides:

- a `diskann` access method named StreamingDiskANN;
- Statistical Binary Quantization storage layout by default;
- label-based filtering through a `smallint[]` label column in the index;
- arbitrary `WHERE` post-filtering;
- query-time knobs such as `diskann.query_search_list_size` and
  `diskann.query_rescore`;
- relaxed ordering by default, with materialized CTE reordering recommended
  when strict final distance order is required;
- no UNLOGGED-table index support.

This is not a direct replacement for our ADC scorer. It is a PostgreSQL graph
index baseline that should be compared at the product level.

Upstream source: <https://github.com/timescale/pgvectorscale>

## Current Harness Status

`scripts/bench_sorted_hnsw_vs_pgvector.sh` now includes an optional
`pgvectorscale_diskann` row:

- if the `vectorscale` extension is unavailable, the script emits a
  `benchmark_note|method=pgvectorscale_diskann|status=skipped|...` line and
  still reports the exact, `sorted_hnsw`, and pgvector rows;
- the DiskANN row currently requires `pgv_storage=vector`; `halfvec` runs emit
  an explicit skip note;
- if `vectorscale` is available and registers the `diskann` access method, the
  script creates a `diskann` index on the same synthetic vector corpus;
- DiskANN result timing is reported with
  `strict_order=materialized_exact_reorder`;
- the index-size line includes `pgvectorscale_diskann` and
  `bench_diskann_total`, or `skipped` when the optional extension is absent.

## Track A: SIMD ADC Optimization

### A1. PQ ADC

Do not optimize `svec_pq_adc_lookup(...)` first if the measured path uses
`svec_ann_scan(...)`, because the standalone SQL function may not be the hot
path.

Required first measurement:

- `svec_ann_scan(...)` phase timing with `sorted_heap.ann_timing = on`;
- candidate count;
- `M`;
- ADC time vs SPI fetch vs rerank.

Candidate optimizations:

- unroll scalar PQ ADC for common `M`;
- accumulate in `float` then widen once if quality is unchanged;
- add NEON/AVX gather variants only behind compile-time/runtime dispatch;
- keep scalar fallback as reference.

Parity gate:

- exact same top-k order or bounded score delta against scalar reference;
- run with at least two `M` values and two dimensions.

### A2. FlashHadamard ADC

Continue the existing platform-specific path:

- keep NEON int16 behind `FH_INT16=1` until larger query/dataset validation;
- integrate AVX2 gather only if it beats current engine path, not only the
  standalone microbench;
- keep AVX2 int16 refuted unless fresh evidence contradicts it.

Parity gate:

- compare top-k overlap, hit@1, recall@10, and score delta against the scalar
  packed scorer;
- benchmark end-to-end, not only kernel throughput.

## Track B: pgvectorscale DiskANN Benchmark

### Required Methods

Benchmark on the same dataset and query set:

- exact heap ground truth;
- `sorted_hnsw` on `svec` or `hsvec`;
- pgvector HNSW when dimension allows;
- pgvectorscale `diskann`;
- optional FlashHadamard packed exhaustive when a compatible store exists;
- optional IVF-PQ residual when codebooks are trained.

### Required pgvectorscale Settings

Record:

- pgvectorscale version;
- PostgreSQL version;
- `storage_layout`;
- `num_neighbors`;
- `search_list_size`;
- `num_dimensions`;
- `num_bits_per_dimension`;
- `diskann.query_search_list_size`;
- `diskann.query_rescore`;
- whether strict result reordering was applied.

### Required Metrics

- p50, p95, and average latency;
- recall@10 and hit@1 against exact ground truth;
- index size, table size, and total footprint;
- build time;
- memory-related build settings such as `maintenance_work_mem`;
- filter mode: none, label-based, or arbitrary post-filter.

## Acceptance Tests

### D1. pgvectorscale harness is optional and fail-open

If `vectorscale` is not installed, the benchmark should skip the DiskANN row
and report why. It must not make local PostgreSQL tests depend on an external
extension.

### D2. Strict-order mode is explicit

Because pgvectorscale can use relaxed ordering, the harness must record whether
the final result set was reordered by exact distance before recall/latency
reporting.

### D3. SIMD kernels keep scalar fallback

Every SIMD ADC implementation must have:

- scalar reference path;
- platform guard;
- parity test;
- runtime or build-time disable switch.

### D4. Product comparison includes footprint

No benchmark row is accepted without storage footprint. DiskANN, HNSW,
FlashHadamard stores, PQ codebooks, and generated-code columns must all be
counted.

## Adversary Notes

- Kernel microbench wins can disappear inside PostgreSQL due to SPI, tuple
  fetch, TOAST, cache warmup, or rerank overhead.
- Relaxed ordering can make a DiskANN result look faster while returning a
  slightly unsorted top-k; strict-order mode must be visible.
- Filtered DiskANN label support and sorted_heap partition routing solve
  different filtering problems. Compare both only on equivalent predicates.
- SBQ/quantized graph storage and our FlashHadamard/PQ storage have different
  quality/footprint curves. A single recall number is insufficient.
- pgvectorscale is an external moving target; benchmark docs must record the
  exact version or commit.

## Decision

For `0.13`, keep SIMD ADC and pgvectorscale DiskANN comparison as benchmark
tracks. Do not claim superiority until a shared harness reports quality,
latency, footprint, build cost, and strict-order behavior on the same workload.