# Roadmap This roadmap is directional. It describes the work pgGraph is designed around, not a dated release commitment. pgGraph's roadmap keeps one boundary fixed: PostgreSQL remains the source of truth. New features should make graph-shaped SQL workloads easier to operate inside PostgreSQL, without turning pgGraph into a separate graph database or a second query language. ## Current Baseline The current engine already supports the core SQL workflow: | Area | Current behavior | |---|---| | Registration | Manual table and edge registration plus schema auto-discovery. | | Querying | Bounded traversal, shortest paths, weighted shortest paths, path reconstruction, source search, workflow helpers, and connected components. | | Maintenance | Explicit build, persist, load, sync apply, vacuum, stats, and background job entry points. | | Sync | Trigger-buffered change capture with explicit foreground or background apply. | | Persistence | Atomic `.pggraph` artifacts under PostgreSQL data storage. Immutable forward graph arrays and resolution are read-only mmap-backed on load; reverse CSR and bincode metadata are backend-local heap structures. | | Safety | ACL preflight, SQLSTATE mapping, circuit breakers, OOM checks, and panic-to-PostgreSQL-error boundaries. | ```text current stable core | +-- SQL-first graph queries +-- trigger-buffered sync +-- read-only mapped launch artifacts +-- release and safety hardening ``` ## Open Source V1 Focus The immediate open-source focus is deliberately practical: make the existing system easy to install, easy to verify, and hard to misunderstand. | Area | Direction | |---|---| | Documentation | Keep SQL user docs and Rust contributor docs in the Nextra docs tree as the single source of truth. | | Packaging | Tighten source install, package validation, fresh install smoke tests, and PostgreSQL 14-18 release evidence. | | CI/CD | Start with unit/static checks on every change, then add PostgreSQL-backed pgrx integration jobs, then promote heavy release gates to scheduled or manual release validation. | | Examples | Maintain small examples for quick evaluation and realistic examples for operational guidance. | | API clarity | Document every exported SQL function, reserved option, error class, and security boundary. | | Operations | Keep backup/restore, crash recovery, concurrency, memory, and package checks reproducible from documented scripts. | ## Near-Term Work ```mermaid flowchart LR A["V1 documentation and packaging"] --> B["Operational hardening"] B --> C["Build and sync improvements"] C --> D["More query ergonomics"] D --> E["Larger-scale validation"] ``` | Area | Planned improvement | |---|---| | Build observability | Better progress, sizing, and failure diagnostics around `graph.build()` and persisted artifacts. | | CI/CD rollout | First gate `cargo fmt --check`, `cargo clippy`, `cargo test`, `cargo doc`, and docs drift checks; next gate `cargo pgrx test`; final gate synthetic/heavy release scripts. | | Sync operations | Clearer scheduling, monitoring, and failure handling around trigger-buffered sync and background maintenance jobs. | | Artifact tooling | More direct inspection and compatibility checks for persisted graph files. | | Query ergonomics | More examples around strict traversal specs, typed filters, workflow helpers, and pagination. | | Benchmarks | More end-to-end SQL benchmarks that separate PostgreSQL execution, graph traversal, hydration, client overhead, and PostgreSQL SQL/PGQ comparisons as PostgreSQL 19 matures. | ## Alpha Hardening Status The critical-path alpha hardening pass has landed the correctness, durability, SQL contract, sync freshness, and operator diagnostics work that was blocking a clearer public baseline. | Track | Status | Notes | |---|---|---| | Data correctness | Complete | Edge endpoint validation and exact hydration numeric comparisons fail closed instead of silently producing wrong answers. | | SQL contract | Complete for the critical path | Weighted shortest paths return stable step rows; search and traversal pagination contracts are documented and tested. | | Persistence safety | Complete for the critical path | Persisted PK UTF-8, resolution collisions, mmap region bounds, and missing `PGDATA` behavior are validated. | | SQL boundary safety | Complete for search | Source search binds user values through SPI parameters and inspects generated SQL in tests. Other catalog-derived dynamic SQL remains tracked in Known Issues. | | Sync freshness | Complete for topology reads | Topology reads auto-apply pending trigger sync rows by default up to a captured high-water mark; `graph.query_freshness = 'off'` preserves manual mode. | | Scheduler integration | Available | `graph.run_scheduled_maintenance()` is scheduler-safe and documented for `pg_cron` or external schedulers. The Docker image enables `pg_cron` and installs a five-minute maintenance schedule. | | Operator diagnostics | Complete for the critical path | `graph.status()`, `graph.sync_health()`, build jobs, and maintenance jobs expose read-only reasons and progress diagnostics. | ## Continuous Improvement There is always another way to optimize a graph engine, especially one that runs inside PostgreSQL backend processes and must respect PostgreSQL memory, locking, ACL, and durability semantics. The launch priority is therefore deliberate: make the critical correctness, persistence, safety, and operational paths work first, then continue improving performance and ergonomics in the open. The current public follow-up list is tracked in [Known Issues](./known-issues). Work is ranked by user impact: | Priority | Focus | Examples | |---|---|---| | P0 | Operational visibility and resource predictability | Build/vacuum diagnostics, sync backlog visibility, clearer read-only states, better memory estimates. | | P1 | Query performance follow-ups | Traversal scratch reuse, connected-component row materialization, hydration setup, source-search rechecks. | | P2 | SQL ergonomics and documented semantics | No active public rows after structured filters gained text predicates, membership, range, and null semantics. | | P3 | Internals and maintainability | pg test locality once pgrx supports the needed out-of-line schema module shape. | The bias is to optimize with evidence. For hot paths, we prefer benchmarks or release-style SQL tests that show whether an allocation, SPI call, sort, clone, or hash lookup is actually material to user workloads. ## Semantic-Guided Search Semantic-guided search is a distinct direction for pgGraph: combine graph structure, relationship constraints, and pgVector-backed node embeddings to find relevant paths without letting high-degree nodes dominate bounded traversal. The first shape should be a `guided_path()` style API rather than a promise of exact A* behavior. It should keep PostgreSQL as the source of truth while ranking candidate expansions with multiple signals: | Signal | Purpose | |---|---| | Vector distance | Prefer nodes semantically closer to the target or query embedding. | | Edge and path cost | Preserve compatibility with weighted graph search where callers have meaningful costs. | | Degree penalty | Avoid expanding super nodes unless their semantic or structural value justifies the branching cost. | | Edge type preference | Let callers constrain or bias search toward useful relationship classes. | | Bidirectional expansion | Shrink the search space where directed-edge semantics allow safe forward/backward search. | | Beam width and circuit breakers | Keep latency and memory predictable with bounded candidate sets, max-node caps, max-depth caps, and diagnostics. | This path should be documented as approximate, relevance-ranked search. Exact shortest-path behavior remains the job of `shortest_path()` and `weighted_shortest_path()`. Future work can add full A* as a specialized mode, but only for cost models where the heuristic can be made admissible. pgVector distance alone is likely better treated as a semantic ranking signal unless edge costs are intentionally designed around the same metric. ## Reserved Features Some configuration values and internals are intentionally shaped for future work, but are not current behavior. | Feature | Current status | Roadmap direction | |---|---|---| | WAL-driven sync | `graph.sync_mode = 'wal'` is reserved and rejected when selected. | Logical replication based sync for deployments that should avoid trigger-buffer volume. | | COPY build scanner | `graph.build_scan_mode = 'copy'` is reserved and rejected when selected. | A server-side COPY scanner once it can preserve the same validation, error, and memory guarantees as the current SPI path. | | Online mutable graph segments | Not part of the current query path. | Investigate append or delta structures that reduce rebuild pressure without weakening traversal correctness. | | pgVector embedding cache | Node embeddings are not part of the graph artifact or traversal hot path. | Investigate a sidecar or persisted embedding cache keyed by internal node IDs so semantic-guided search can rank candidates without source-table lookups inside the search loop. | | Full A* path search | Not current behavior. Existing shortest paths use bidirectional BFS and weighted paths use Dijkstra. | Consider as a specialized semantic-guided search mode only when the heuristic has an explicit admissibility contract. | | Additional graph analytics | Connected components exist today; PageRank, Louvain, betweenness, and similar global algorithms are not built in. | Consider companion analytics paths where they fit PostgreSQL operations and do not slow the hot traversal path. | | Distributed graph execution | Not a V1 goal. | Revisit only after the single-database engine, packaging, and operational model are mature. | ## Non-Goals pgGraph is not trying to become: | Non-goal | Reason | |---|---| | A replacement storage engine | PostgreSQL tables, indexes, constraints, ACLs, and RLS remain the system of record. | | A Cypher, Gremlin, SPARQL, or GQL implementation | The public interface is SQL functions in the `graph` schema. | | A distributed graph database | The current architecture is local to one PostgreSQL database. | | A global analytics engine in the OLTP query path | Expensive whole-graph algorithms need different scheduling and resource controls than bounded traversal. | ## Documentation Policy Going forward, project documentation should live under `docs/`. - SQL behavior, installation, operations, and examples belong in `docs/user_guide/`. - Rust internals, persistence, safety, memory, tests, and release process belong in `docs/contributor_guide/`. - The top-level overview and system design stay directly under `docs/`. - Quickstart and roadmap pages live directly under `docs/`; user guide and contributor guide pages live under `docs/user_guide/` and `docs/contributor_guide/`. - Repository-local README files should stay small pointers or be removed when they duplicate the docs tree.