# v0.42.0 — Hybrid Search & Sparse Vector Aggregates > **Full technical details:** [v0.42.0.md-full.md](v0.42.0.md-full.md) **Status: Planned** | **Scope: Medium** > Sparse and half-precision vector aggregates for storage-tiered embedding > pipelines, reactive neighbour-alert subscriptions, hybrid-search corpus > patterns, and differential-refresh performance benchmarks for vector > workloads. --- ## What is this? v0.41.0 completes embedding pipeline infrastructure; v0.42.0 completes the feature set for production AI/RAG workloads by adding sparse and half-precision vector aggregation (for tiered storage), reactive vector-similarity alerts, and hybrid-search cookbook patterns with performance benchmarks. --- ## Sparse and half-precision vector aggregates pgvector 0.7+ ships `halfvec` and `sparsevec` types for storage-efficient embeddings. v0.42.0 adds: - `sparsevec_avg` — element-wise mean over the union of sparse dimensions, treating absent entries as 0. Returns `sparsevec`. - `halfvec_avg` — element-wise mean with running state upcast to `vector(d)`; returns `halfvec(d)` on read. - `sparsevec_sum` and `halfvec_sum` — for pre-aggregation workflows. These are algebraically identical to `vector_avg` / `vector_sum` but respect the native type on output. A stream table can maintain tiered storage: raw `vector(1536)` → `halfvec(1536)` → `sparsevec(1536)` (for binary-quantised re-ranking), each layer incrementally maintained and independently indexed. --- ## Reactive vector-similarity subscriptions Extending reactive subscriptions (v0.35), add support for reactive queries over distance predicates: ```sql LISTEN fraud_alert; SELECT pgtrickle.create_reactive_subscription( 'fraud_alert', $$ SELECT new_txn.id FROM transactions_embedded new_txn JOIN known_fraud_vectors k ON new_txn.embedding <=> k.embedding < 0.05 $$ ); ``` The subscription fires whenever a *new* transaction embedding lands within ε of a known-fraud cluster, emitting the transaction ID via `NOTIFY`. Unlike a plain SELECT, the subscription is differential: only *newly matched* rows fire, no spurious re-emits on every refresh. --- ## Hybrid-search corpus pattern Combine vector similarity with BM25 full-text search and metadata filters on a single flat stream table: ```sql -- One-statement denormalised corpus maintenance SELECT pgtrickle.create_stream_table( 'hybrid_search_corpus', $$ SELECT doc.id, doc.title, doc.body, doc.embedding, array_agg(tag.name) AS tags, perm.tenant_id, perm.read_groups FROM documents doc JOIN doc_tags tag ON tag.doc_id = doc.id JOIN doc_permissions perm ON perm.doc_id = doc.id WHERE doc.active GROUP BY doc.id, perm.tenant_id, perm.read_groups $$, refresh_mode => 'DIFFERENTIAL', schedule => '10 seconds' ); CREATE INDEX ON hybrid_search_corpus USING hnsw (embedding vector_cosine_ops); CREATE INDEX ON hybrid_search_corpus USING gin (to_tsvector('english', body)); CREATE INDEX ON hybrid_search_corpus (tenant_id); ``` Cookbook: `docs/tutorials/HYBRID_SEARCH_PATTERNS.md`. --- ## Performance benchmarks Add benchmark suite `benches/pgvector_bench.rs`: - **Differential refresh + HNSW insert throughput:** measure rows/sec for incremental embedding corpus updates vs. bulk REINDEX. - **Vector aggregate reducer cost:** microseconds per element for `vector_avg` vs. plain `sum` reducer. - **Drift detection overhead:** per-refresh cost of tracking `rows_changed_since_last_reindex`. CI integration: publish results to the benchmark dashboard alongside refresh latency benchmarks. --- ## Also in v0.42.0 - Any deferred items from v0.41.0 that did not meet the exit criteria - Integration tests for hybrid-search corpus under write load (Nexmark-style) with latency baseline established before the 50 ms assertion is written --- ## Scope v0.42.0 is a medium release at the upper end. The new test files (tiered storage, reactive distance, Nexmark hybrid) are substantive integration work, and the benchmark suite adds CI infrastructure. If capacity is tight, cookbook polish and dashboard integration may slip; the aggregate and reactive-subscribe feature work should not. > **Note:** Reactive distance subscriptions require the underlying v0.35.0 > reactive subscription implementation to remain fully signed off before this > extension work begins. --- *Previous: [v0.41.0 — Embedding Pipeline Infrastructure & ANN Maintenance](v0.41.0.md)* *Next: [v0.43.0 — Embedding API & Advanced RAG Patterns](v0.43.0.md)*