> **Plain-language companion:** [v0.43.0.md](v0.43.0.md) ## v0.43.0 — Embedding API & Advanced RAG Patterns **Status: Planned.** Derived from [plans/ecosystem/PLAN_PGVECTOR.md](plans/ecosystem/PLAN_PGVECTOR.md) §6.4, rescheduled after the assessment-driven hardening arc. > **Release Theme** > v0.43.0 turns the embedding infrastructure built in v0.41.0-v0.42.0 into a > higher-level ergonomic surface. The `embedding_stream_table()` API enables > one-call RAG corpus setup; materialised k-NN graph work remains a parallel > research spike; per-tenant ANN patterns and outbox-emitted embedding events > complete the application-facing story. --- ### Features | ID | Title | Effort | Priority | |----|-------|--------|----------| | VA-1 | `embedding_stream_table()` ergonomic API: one-call RAG corpus setup | L | P1 | | VA-2 | Materialised k-NN graph research spike for fixed-pivot retrieval | L | P3 | | VA-3 | Per-tenant ANN indexing patterns: RLS-scoped embedding corpora | M | P2 | | VA-4 | Outbox-emitted embedding events for downstream consumption | M | P2 | | VA-5 | Starter repo and ecosystem positioning *(best-effort, not a release gate)* | S | P2 | **VA-1 — `embedding_stream_table()` ergonomic API.** Add a high-level function that auto-generates the denormalization query, creates the stream table, provisions indexes, configures post-refresh actions, and returns a dry-run preview when requested. The API should cover the common single-source-plus-joins RAG patterns without taking SQL control away from power users. **VA-2 — Materialised k-NN graph research spike.** Explore whether pre-computing neighbour relationships for fixed pivot vectors is worth the storage and maintenance overhead relative to ANN index scans. This is research, not a release gate: the deliverable is a real trade-off analysis and, optionally, a proof of concept if the numbers look promising. **VA-3 — Per-tenant ANN indexing patterns.** Document and example the production pattern for multi-tenant RAG using RLS and tenant-scoped vector corpora. The emphasis is on safe defaults and explicit security review, not just on query examples. **VA-4 — Outbox-emitted embedding events.** Extend the outbox surface so embedding changes can be emitted as downstream application events. This lets external agents or search services react to embedding churn without polling. **VA-5 — Starter repo and ecosystem positioning.** Provide a public starter repository and best-effort ecosystem material that show how pg_trickle fits alongside pgvector and pgai after the hardening arc. This is useful, but it must never block the release. ### Test Coverage | ID | Title | Effort | Priority | |----|-------|--------|----------| | T-VA1 | Integration test: `embedding_stream_table()` generates correct stream tables | M | P1 | | T-VA2 | Research benchmark: k-NN graph trade-off measurement | L | P2 | | T-VA3 | Multi-tenant security test for ANN stream tables | M | P1 | | T-VA4 | Outbox event emission for embedding changes | M | P2 | **T-VA1.** Call `embedding_stream_table()` with a realistic joins-and-aggregates spec, verify the generated stream table and indexes, and compare results with the manually written equivalent definition. **T-VA2.** If the k-NN research spike includes code, benchmark pivot-neighbour queries against direct ANN index scans so the trade-off is evidence-based. **T-VA3.** Verify that tenant-scoped ANN stream tables respect RLS policies and do not leak cross-tenant embeddings. **T-VA4.** Create an embedding-aware outbox stream and verify downstream subscribers see correctly structured events when embeddings change. ### Conflicts & Risks - **VA-1** hides real query complexity. `dry_run => true` and explicit index heuristics are essential so expert users can audit the generated surface. - **VA-2** is intentionally non-blocking research. Keep it from turning into an unbounded scope sink for a release that already includes a large ergonomic API. - **VA-3** must be reviewed with the security model documented in the hardening arc; RLS examples are only useful if their trust boundaries are explicit. ### Exit Criteria - [ ] VA-1: `embedding_stream_table()` generates correct SQL, indexes, and monitoring configuration - [ ] VA-1: `dry_run => true` returns the generated SQL without side effects - [ ] VA-1: documented index-inference heuristics exist for vector and halfvec cases - [ ] VA-2: research findings for materialised k-NN graphs are published; implementation remains optional - [ ] VA-3: multi-tenant ANN documentation and security tests are complete - [ ] VA-4: embedding outbox events emit and validate correctly - [ ] VA-5: starter repo or equivalent public examples are available - [ ] Extension upgrade path tested (`0.42.0 → 0.43.0`) - [ ] `just check-version-sync` passes ---