# v0.39.0 — Hybrid Search & Sparse Vector Aggregates > **Full technical details:** [v0.39.0.md-full.md](v0.39.0.md-full.md) **Status: Planned** | **Scope: Medium** > Sparse and half-precision vector aggregates for storage-tiered embedding > pipelines, reactive neighbour-alert subscriptions, hybrid-search corpus > patterns, and differential-refresh performance benchmarks for vector > workloads. --- ## What is this? v0.38.0 completed embedding pipeline infrastructure; v0.39.0 completes the feature set for production AI/RAG workloads by adding sparse and half-precision vector aggregation (for tiered storage), reactive vector-similarity alerts, and hybrid-search cookbook patterns with performance benchmarks. --- ## Sparse and half-precision vector aggregates pgvector 0.7+ ships `halfvec` and `sparsevec` types for storage-efficient embeddings. v0.39.0 adds: - `sparsevec_avg` — element-wise mean over the union of sparse dimensions, treating absent entries as 0. Returns `sparsevec`. - `halfvec_avg` — element-wise mean with running state upcast to `vector(d)`; returns `halfvec(d)` on read. - `sparsevec_sum` and `halfvec_sum` — for pre-aggregation workflows. These are algebraically identical to `vector_avg` / `vector_sum` but respect the native type on output. A stream table can maintain tiered storage: raw `vector(1536)` → `halfvec(1536)` → `sparsevec(1536)` (for binary-quantised re-ranking), each layer incrementally maintained and independently indexed. --- ## Reactive vector-similarity subscriptions Extending reactive subscriptions (v0.35), add support for reactive queries over distance predicates: ```sql LISTEN fraud_alert; SELECT pgtrickle.create_reactive_subscription( 'fraud_alert', $$ SELECT new_txn.id FROM transactions_embedded new_txn JOIN known_fraud_vectors k ON new_txn.embedding <=> k.embedding < 0.05 $$ ); ``` The subscription fires whenever a *new* transaction embedding lands within ε of a known-fraud cluster, emitting the transaction ID via `NOTIFY`. Unlike a plain SELECT, the subscription is differential: only *newly matched* rows fire, no spurious re-emits on every refresh. --- ## Hybrid-search corpus pattern Combine vector similarity with BM25 full-text search and metadata filters on a single flat stream table: ```sql -- One-statement denormalised corpus maintenance SELECT pgtrickle.create_stream_table( 'hybrid_search_corpus', $$ SELECT doc.id, doc.title, doc.body, doc.embedding, array_agg(tag.name) AS tags, perm.tenant_id, perm.read_groups FROM documents doc JOIN doc_tags tag ON tag.doc_id = doc.id JOIN doc_permissions perm ON perm.doc_id = doc.id WHERE doc.active GROUP BY doc.id, perm.tenant_id, perm.read_groups $$, refresh_mode => 'DIFFERENTIAL', schedule => '10 seconds' ); CREATE INDEX ON hybrid_search_corpus USING hnsw (embedding vector_cosine_ops); CREATE INDEX ON hybrid_search_corpus USING gin (to_tsvector('english', body)); CREATE INDEX ON hybrid_search_corpus (tenant_id); ``` Cookbook: `docs/tutorials/HYBRID_SEARCH_PATTERNS.md`. --- ## Performance benchmarks Add benchmark suite `benches/pgvector_bench.rs`: - **Differential refresh + HNSW insert throughput:** measure rows/sec for incremental embedding corpus updates vs. bulk REINDEX. - **Vector aggregate reducer cost:** microseconds per element for `vector_avg` vs. plain `sum` reducer. - **Drift detection overhead:** per-refresh cost of tracking `rows_changed_since_last_reindex`. CI integration: publish results to the benchmark dashboard alongside refresh latency benchmarks (v0.31). --- ## Also in v0.39.0 - Any deferred items from v0.38.0 that did not meet the exit criteria - Integration tests for hybrid-search corpus under write load (Nexmark-style) with latency baseline established before the 50 ms assertion is written --- ## Scope v0.39.0 is a medium release — but sits at the upper end. The three new test files (tiered storage, reactive distance, Nexmark hybrid) are each substantive integration tests; VH-4 (benchmarks) adds CI infrastructure. If capacity is tight, VH-3 (cookbook) and VH-4 (benchmarks) may slip to v0.40.0; the VH-1 and VH-2 feature work must land in v0.39.0. > **Note:** VH-2 (reactive distance subscriptions) requires v0.35.0 reactive > subscription correctness to be fully confirmed before implementation begins. --- *Previous: [v0.38.0 — Embedding Pipeline Infrastructure & ANN Maintenance](v0.38.0.md)* *Next: [v0.40.0 — Embedding API & Advanced RAG Patterns (Research)](v0.40.0.md)*