# Sync And Maintenance
The base graph is immutable CSR once built. Sync captures source table changes
into durable log tables and applies what can be represented as backend-local
overlays. Vacuum and maintenance rebuild the base graph to fold changes back
into the core stores.
## Modes
| Mode | Current behavior |
|---|---|
| `manual` | No automatic trigger install. Use `graph.build()` to refresh from source tables. |
| `trigger` | `graph.enable_sync()` installs triggers that write to `graph._sync_log`; use `graph.apply_sync()` or `graph.maintenance()` to apply/rebuild. |
| `wal` | Reserved and rejected by current sync code. |
```sql
ALTER SYSTEM SET graph.sync_mode = 'trigger';
SELECT pg_reload_conf();
```
## Enable Trigger Sync
```sql
SELECT graph.enable_sync();
```
This creates trigger functions and attaches INSERT, UPDATE, DELETE, and TRUNCATE
triggers to registered tables. Trigger rows are durable in `graph._sync_log`.
```text
source table DML
|
v
AFTER trigger
|
v
graph._sync_log
|
+--> graph.apply_sync()
|
+--> graph.maintenance()
```
## Apply Sync
```sql
SELECT * FROM graph.apply_sync();
```
Return columns:
| Column | Meaning |
|---|---|
| `inserts_applied` | Insert sync rows applied |
| `updates_applied` | Update sync rows applied |
| `deletes_applied` | Delete sync rows applied |
Current apply behavior includes:
| Change | Behavior |
|---|---|
| Inserted node row | Materializes mmap node store if needed, appends node, updates resolution delta, filter values, and tenant membership |
| Updated node row | Handles primary-key changes, property/filter refresh, tenant refresh, and edge mutations where possible |
| Deleted node row | Tombstones node in owned store |
| Edge mutations | Stored in edge overlay buffer until rebuild/vacuum |
| Truncate | Marks rebuild/vacuum need |
`graph.apply_sync()` drains the durable sync rows that were pending when the
call started. Internally, replay is bounded by `graph.sync_batch_size`, so large
backlogs are processed in smaller batches without changing the public admin
contract.
The base CSR store is immutable. Edge changes use overlay buffers for query
visibility in the backend that applies sync, but `graph.vacuum()` or
`graph.maintenance()` is the normal path for merging them into a clean base
graph.
## Query Freshness
`graph.query_freshness` controls whether topology reads apply pending
trigger-sync rows before executing. The default is `apply_pending_sync`, so
topology reads catch up to a captured `_sync_log` high-water mark before reading
when `graph.sync_mode = 'trigger'`.
| Value | Behavior |
|---|---|
| `apply_pending_sync` | Topology reads apply rows up to a captured `_sync_log` high-water mark before reading. |
| `off` | Compatibility mode; reads do not apply pending sync rows automatically. |
| `error_on_pending` | Topology reads fail when pending sync rows exist. |
The topology reads covered by query freshness are `graph.traverse()` variants,
shortest-path queries, weighted shortest-path queries, component APIs, and
`graph.traverse_search()` before it traverses from search matches. Primitive
`graph.search()` remains source-table SQL search and does not use the topology
freshness helper. `graph.apply_sync()` remains the explicit operator/admin path
for all pending sync rows.
## Sync Health
```sql
SELECT *
FROM graph.sync_health();
```
`graph.sync_health()` returns one row of scheduler-friendly signals:
| Column | Meaning |
|---|---|
| `query_freshness` | Current topology-read freshness mode |
| `sync_batch_size` | Maximum durable sync rows replayed per internal batch |
| `applied_sync_id` | Last durable sync log row applied by this backend-local engine |
| `max_sync_log_id` | Current high-water row in `graph._sync_log` |
| `pending_sync_rows` | Durable sync rows newer than `applied_sync_id` |
| `disabled_trigger_count` | Registered graph triggers currently disabled |
| `edge_buffer_used` | Pending edge overlay mutations |
| `edge_buffer_size` | Configured edge overlay capacity |
| `read_only_reason` | Typed reason such as `memory_limit` or `edge_buffer_full` when read-only |
| `apply_sync_recommended` | Pending rows can be replayed by `graph.apply_sync()` while the engine is writable |
| `maintenance_recommended` | Overlay, read-only, vacuum, or rebuild state should be compacted |
Use `graph.run_scheduled_maintenance()` from `pg_cron` or an external scheduler
when you want one call that applies sync if recommended and then starts
background maintenance when compaction or rebuild work remains:
```sql
SELECT * FROM graph.run_scheduled_maintenance();
```
For PostgreSQL installations that use `pg_cron`:
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'pggraph-maintenance',
'*/1 * * * *',
$$SELECT * FROM graph.run_scheduled_maintenance();$$
);
```
Use a slower cadence when write volume is low:
```sql
SELECT cron.schedule(
'pggraph-maintenance',
'*/5 * * * *',
$$SELECT * FROM graph.run_scheduled_maintenance();$$
);
```
The extension does not run an always-on internal scheduler or background loop.
Operators own the maintenance cadence and lifecycle through `pg_cron`,
Kubernetes CronJobs, systemd timers, Docker initialization SQL, or an
application-side scheduler. Keep the scheduled statement to the single
admin-only call above so pgGraph can continue to apply sync and start
maintenance using its built-in health checks.
## Edge Buffer Limits
`graph.edge_buffer_size` limits pending edge mutations. When the buffer is full,
the engine enters read-only mode, reports `PG008` at the capacity breach, and
exposes `read_only_reason = 'edge_buffer_full'`.
```sql
ALTER SYSTEM SET graph.edge_buffer_size = 500000;
SELECT pg_reload_conf();
```
Recovery:
```sql
SELECT * FROM graph.vacuum();
-- or
SELECT * FROM graph.maintenance();
```
## Vacuum
```sql
SELECT * FROM graph.vacuum();
```
Return columns:
| Column | Meaning |
|---|---|
| `nodes_before` | Node slots before rebuild |
| `nodes_after` | Node slots after rebuild |
| `tombstones_removed` | Removed inactive node slots |
| `edges_rebuilt` | Directed edges in rebuilt CSR |
| `vacuum_time_ms` | Wall-clock rebuild time |
The current implementation performs a full rebuild from source tables. This is
correct for immutable CSR and reclaims tombstones and overlay state.
Vacuum has a double-memory period because the old and new engines coexist until
the swap completes. Keep `graph.memory_limit_mb` high enough for both engines.
## Maintenance
Foreground maintenance:
```sql
SELECT * FROM graph.maintenance(concurrently := false);
```
Background maintenance:
```sql
SELECT * FROM graph.maintenance(concurrently := true);
SELECT * FROM graph.maintenance_status('');
```
If background maintenance fails after the worker starts, pgGraph records the
failed status and error detail from a fresh worker transaction so
`graph.maintenance_status('')` remains the operator source of truth.
Return columns:
| Column | Meaning |
|---|---|
| `job_id` | Durable maintenance job ID, or zero UUID for foreground path |
| `status` | `queued`, `running`, `completed`, or `failed` |
| `sync_rows_applied` | Sync rows folded into the rebuild |
| `nodes_after` | Node count after maintenance |
| `edges_after` | Edge count after maintenance |
| `vacuum_time_ms` | Rebuild duration |
| `progress_phase` | Coarse operator phase such as `queued`, `rebuilding`, `persisting`, `validating_persistence`, `completed`, or `failed` |
| `progress_message` | Short operator-readable progress or failure detail |
| `error` | Failure text if any |
`graph.maintenance_status(NULL)` returns the latest 50 maintenance jobs.
## Status Columns For Sync
```sql
SELECT
sync_mode,
sync_status,
edge_buffer_used,
needs_vacuum,
needs_rebuild,
applied_sync_id,
pending_sync_rows,
disabled_trigger_count,
invalid_reason
FROM graph.status();
```
| Column | Meaning |
|---|---|
| `sync_status` | `idle`, `syncing`, or `read_only` |
| `edge_buffer_used` | Pending edge overlay mutations |
| `needs_vacuum` | Overlay/tombstone state should be merged |
| `needs_rebuild` | Catalog/schema drift requires full rebuild |
| `applied_sync_id` | Last durable sync log row applied by this backend-local engine |
| `pending_sync_rows` | Durable sync rows newer than `applied_sync_id` |
| `disabled_trigger_count` | Registered graph triggers currently disabled |
## Operational Patterns
Small manual graph:
```sql
SELECT * FROM graph.build();
-- periodically:
SELECT * FROM graph.build();
```
Trigger-sync graph:
```sql
ALTER SYSTEM SET graph.sync_mode = 'trigger';
SELECT pg_reload_conf();
SELECT graph.enable_sync();
SELECT * FROM graph.build();
-- after workload:
SELECT * FROM graph.apply_sync();
SELECT * FROM graph.maintenance();
```
High-write graph:
```text
Use trigger sync for visibility.
Run graph.apply_sync() when graph.sync_health().apply_sync_recommended is true.
Run graph.maintenance(concurrently := true) when maintenance_recommended is true.
Watch edge_buffer_used, pending_sync_rows, disabled_trigger_count, read_only, and read_only_reason.
```