# Sync And Maintenance
The base graph is immutable CSR once built. Sync captures source table changes
into durable log tables. In `mutable_overlay` mode, committed edge changes are
published as durable projection segments so other backends can observe them
after `graph.apply_sync()` without a full rebuild. Vacuum and maintenance still
rebuild the base graph to fold accumulated changes back into the core stores.
## Modes
| Mode | Current behavior |
|---|---|
| `manual` | No automatic trigger install. Use `graph.build()` to refresh from source tables. |
| `trigger` | Default. `graph.build()` and `graph.enable_sync()` install triggers that write to `graph._sync_log`; topology reads auto-apply pending rows by default, and `graph.apply_sync()` or `graph.maintenance()` remain available for explicit operator workflows. |
| `wal` | Reserved and rejected by current sync code. |
```sql
ALTER SYSTEM SET graph.sync_mode = 'manual';
SELECT pg_reload_conf();
```
## Trigger Sync
```sql
SELECT graph.enable_sync();
```
With the default `graph.sync_mode = 'trigger'`, `graph.build()` creates trigger
functions and attaches INSERT, UPDATE, DELETE, and TRUNCATE triggers to
registered tables. `graph.enable_sync()` is still available when operators want
to install or reinstall those triggers explicitly. Trigger rows are durable in
`graph._sync_log`.
```text
source table DML
|
v
AFTER trigger
|
v
graph._sync_log
|
+--> graph.apply_sync()
|
+--> graph.maintenance()
```
## Apply Sync
```sql
SELECT * FROM graph.apply_sync();
```
Return columns:
| Column | Meaning |
|---|---|
| `inserts_applied` | Insert sync rows applied |
| `updates_applied` | Update sync rows applied |
| `deletes_applied` | Delete sync rows applied |
Current apply behavior includes:
| Change | Behavior |
|---|---|
| Inserted node row | Materializes mmap node store if needed, appends node, updates resolution delta, filter values, and tenant membership |
| Updated node row | Handles primary-key changes, property/filter refresh, tenant refresh, and edge mutations where possible |
| Deleted node row | Tombstones node in owned store |
| Edge mutations in `mutable_overlay` | Published into durable projection segments and loaded through the layered read path |
| Edge mutations outside durable projection coverage | Stored in the legacy edge overlay buffer until rebuild/vacuum |
| Truncate | Marks rebuild/vacuum need |
`graph.apply_sync()` drains the durable sync rows that were pending when the
call started. Internally, replay is bounded by `graph.sync_batch_size`, so large
backlogs are processed in smaller batches without changing the public admin
contract.
The base CSR store is immutable. Durable projection segments make committed
mutable-overlay edge changes visible across backends after sync is applied, but
`graph.vacuum()` or `graph.maintenance()` is still the normal path for merging
accumulated projection state into a clean base graph.
## Query Freshness
`graph.query_freshness` controls whether topology reads apply pending
trigger-sync rows before executing. The default is `apply_pending_sync`, so
topology reads catch up to a captured `_sync_log` high-water mark before reading
when `graph.sync_mode = 'trigger'`.
| Value | Behavior |
|---|---|
| `apply_pending_sync` | Topology reads apply rows up to a captured `_sync_log` high-water mark before reading. |
| `off` | Compatibility mode; reads do not apply pending sync rows automatically. |
| `error_on_pending` | Topology reads fail when pending sync rows exist. |
The topology reads covered by query freshness are `graph.traverse()` variants,
shortest-path queries, weighted shortest-path queries, component APIs, and
`graph.traverse_search()` before it traverses from search matches. Primitive
`graph.search()` remains source-table SQL search and does not use the topology
freshness helper. `graph.apply_sync()` remains the explicit operator/admin path
for all pending sync rows.
## Sync Health
```sql
SELECT *
FROM graph.sync_health();
```
`graph.sync_health()` returns one row of scheduler-friendly signals:
| Column | Meaning |
|---|---|
| `query_freshness` | Current topology-read freshness mode |
| `sync_batch_size` | Maximum durable sync rows replayed per internal batch |
| `applied_sync_id` | Last durable sync log row applied by this backend-local engine |
| `max_sync_log_id` | Current high-water row in `graph._sync_log` |
| `pending_sync_rows` | Durable sync rows newer than `applied_sync_id` |
| `disabled_trigger_count` | Registered graph triggers currently disabled |
| `edge_buffer_used` | Pending edge overlay mutations |
| `edge_buffer_size` | Configured edge overlay capacity |
| `projection_mode` | Built projection mode: `csr_readonly` or `mutable_overlay` |
| `overlay_tombstone_count` | Pending overlay delete/tombstone count |
| `overlay_memory_bytes` | Estimated backend-local overlay heap memory |
| `compaction_recommended` | `true` when durable overlay count, tombstones, memory, or vacuum state crosses the configured compaction policy |
| `tx_delta_dirty` | Whether this backend has transaction-local graph deltas |
| `tx_delta_added_nodes` / `tx_delta_deleted_nodes` | Transaction-local node delta counts |
| `tx_delta_added_edges` / `tx_delta_deleted_edges` | Transaction-local edge delta counts |
| `tx_delta_memory_bytes` | Estimated transaction-delta heap memory |
| `read_only_reason` | Typed reason such as `memory_limit` or `edge_buffer_full` when read-only |
| `apply_sync_recommended` | Pending rows can be replayed by `graph.apply_sync()` while the engine is writable |
| `maintenance_recommended` | Overlay, read-only, vacuum, or rebuild state should be compacted |
| `durable_ingest_recommended` | Durable projection rows are waiting to be ingested |
| `durable_compaction_recommended` | Durable segment fanout exceeds the configured compaction threshold |
| `durable_gc_recommended` | Manifest-declared obsolete projection files can be evaluated by GC |
| `durable_repair_recommended` | Active projection artifacts need targeted repair or full rebuild |
Use `graph.run_scheduled_maintenance()` from `pg_cron` or an external scheduler
when you want one call that applies sync if recommended and then starts
background maintenance when compaction or rebuild work remains:
```sql
SELECT * FROM graph.run_scheduled_maintenance();
```
For PostgreSQL installations that use `pg_cron`:
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'pggraph-maintenance',
'*/1 * * * *',
$$SELECT * FROM graph.run_scheduled_maintenance();$$
);
```
Use a slower cadence when write volume is low:
```sql
SELECT cron.schedule(
'pggraph-maintenance',
'*/5 * * * *',
$$SELECT * FROM graph.run_scheduled_maintenance();$$
);
```
The extension does not run an always-on internal scheduler or background loop.
Operators own the maintenance cadence and lifecycle through `pg_cron`,
Kubernetes CronJobs, systemd timers, Docker initialization SQL, or an
application-side scheduler. Keep the scheduled statement to the single
admin-only call above so pgGraph can continue to apply sync and start
maintenance using its built-in health checks.
## Edge Buffer Limits
`graph.edge_buffer_size` limits legacy/backend-local pending edge mutations.
For persisted `mutable_overlay` graphs, committed edge changes normally use
durable projection segments instead of this buffer. When the legacy buffer is
full, the engine enters read-only mode, reports `PG008` at the capacity breach,
and exposes `read_only_reason = 'edge_buffer_full'`.
```sql
ALTER SYSTEM SET graph.edge_buffer_size = 500000;
SELECT pg_reload_conf();
```
Recovery:
```sql
SELECT * FROM graph.vacuum();
-- or
SELECT * FROM graph.maintenance();
```
## Vacuum
```sql
SELECT * FROM graph.vacuum();
```
Return columns:
| Column | Meaning |
|---|---|
| `nodes_before` | Node slots before rebuild |
| `nodes_after` | Node slots after rebuild |
| `tombstones_removed` | Removed inactive node slots |
| `edges_rebuilt` | Directed edges in rebuilt CSR |
| `vacuum_time_ms` | Wall-clock rebuild time |
The current implementation performs a full rebuild from source tables. This is
correct for immutable CSR and reclaims tombstones and overlay state.
Vacuum has a double-memory period because the old and new engines coexist until
the swap completes. Keep `graph.memory_limit_mb` high enough for both engines.
## Maintenance
Foreground maintenance:
```sql
SELECT * FROM graph.maintenance(concurrently := false);
```
Background maintenance:
```sql
SELECT * FROM graph.maintenance(concurrently := true);
SELECT * FROM graph.maintenance_status('');
```
If background maintenance fails after the worker starts, pgGraph records the
failed status and error detail from a fresh worker transaction so
`graph.maintenance_status('')` remains the operator source of truth.
Return columns:
| Column | Meaning |
|---|---|
| `job_id` | Durable maintenance job ID, or zero UUID for foreground path |
| `status` | `queued`, `running`, `completed`, or `failed` |
| `sync_rows_applied` | Sync rows folded into the rebuild |
| `nodes_after` | Node count after maintenance |
| `edges_after` | Edge count after maintenance |
| `vacuum_time_ms` | Rebuild duration |
| `progress_phase` | Coarse operator phase such as `queued`, `rebuilding`, `persisting`, `validating_persistence`, `completed`, or `failed` |
| `progress_message` | Short operator-readable progress or failure detail |
| `error` | Failure text if any |
`graph.maintenance_status(NULL)` returns the latest 50 maintenance jobs.
## Status Columns For Sync
```sql
SELECT
sync_mode,
sync_status,
edge_buffer_used,
needs_vacuum,
needs_rebuild,
projection_mode,
tx_delta_dirty,
applied_sync_id,
pending_sync_rows,
disabled_trigger_count,
invalid_reason
FROM graph.status();
```
| Column | Meaning |
|---|---|
| `sync_status` | `idle`, `syncing`, or `read_only` |
| `edge_buffer_used` | Pending edge overlay mutations |
| `needs_vacuum` | Overlay/tombstone state should be merged |
| `needs_rebuild` | Catalog/schema drift requires full rebuild |
| `projection_mode` | Built projection mode |
| `tx_delta_dirty` | Whether this backend has transaction-local graph deltas |
| `applied_sync_id` | Last durable sync log row applied by this backend-local engine |
| `pending_sync_rows` | Durable sync rows newer than `applied_sync_id` |
| `disabled_trigger_count` | Registered graph triggers currently disabled |
Use `graph.active_generation_count()` to inspect unexpired backend heartbeats
that retain durable projection generations for generation-aware cleanup.
Use `graph.projection_status()` for the full durable projection diagnostic row:
```sql
SELECT *
FROM graph.projection_status();
```
The row reports the active manifest generation and watermark, pending durable
rows, segment and chunk counts/bytes, tombstone ratio, compaction backlog,
obsolete file counts/bytes, active generation count, artifact validation state,
persisted last-ingestion/compaction/GC/repair timestamps, and
ingestion/compaction/GC/repair recommendations. `graph.sync_health()` keeps only
the lightweight durable recommendation booleans; use `graph.projection_status()`
for full artifact validation.
Run `graph.projection_gc()` to delete manifest-declared obsolete projection
files after the configured generation retention floor and active-generation
heartbeats no longer protect them:
```sql
SELECT *
FROM graph.projection_gc();
```
`protected_candidates` counts obsolete files that were still referenced by a
retained valid manifest or active backend generation. `deleted_files` and
`deleted_bytes` report files actually removed during that call.
Run `graph.projection_repair()` when load or maintenance diagnostics indicate
corrupt durable projection metadata:
```sql
SELECT *
FROM graph.projection_repair();
```
The result reports `healthy`, `targeted_chunk_repair`, `full_rebuild`, or
`no_projection`. Full rebuilds use PostgreSQL source tables, persist a fresh
`.pggraph` artifact, and publish a new base-only projection generation before
the backend reloads it.
## Operational Patterns
Small manual graph:
```sql
ALTER SYSTEM SET graph.sync_mode = 'manual';
SELECT pg_reload_conf();
SELECT * FROM graph.build();
-- periodically:
SELECT * FROM graph.build();
```
Default trigger-sync graph:
```sql
SELECT * FROM graph.build();
-- after workload:
SELECT * FROM graph.apply_sync();
SELECT * FROM graph.maintenance();
```
High-write graph:
```text
Use trigger sync for visibility.
Run graph.apply_sync() when graph.sync_health().apply_sync_recommended is true.
Run graph.maintenance(concurrently := true) when maintenance_recommended is true.
Watch edge_buffer_used, pending_sync_rows, disabled_trigger_count, read_only, and read_only_reason.
```