# v0.27.0 — Operability, Observability, and Disaster Recovery > **Full technical details:** [v0.27.0.md-full.md](v0.27.0.md-full.md) **Status: Planned** | **Scope: Medium** (~3–4 weeks) > Snapshot and point-in-time restore for stream tables, predictive schedule > recommendations, cluster-wide worker visibility, OpenMetrics conformance, > and an upgrade to pgrx 0.18. --- ## What problem does this solve? As pg_trickle approached its 1.0 milestone, a final set of operational gaps became the focus: bootstrapping a fresh replica with a stream table's content without re-running the defining query from scratch (too slow for large tables), turning the accumulated cost model history into actionable schedule recommendations, making worker allocation visible across all databases in a cluster, and ensuring the Prometheus metrics endpoint is formally conformant. --- ## Stream Table Snapshot and Point-in-Time Restore **`pgtrickle.snapshot_stream_table(name, target)`** exports the complete state of a stream table — its current rows, its frontier (the position in the change history up to which it has been refreshed), and its metadata — into an archival companion table. This snapshot can be taken at any time and transferred to another database instance. **`pgtrickle.restore_from_snapshot(name, source)`** rehydrates a stream table from a snapshot on a fresh instance. The stream table is populated from the snapshot, and the first refresh cycle after restore runs differentially (catching up only from the snapshot's frontier), rather than recomputing everything from scratch. *In plain terms:* if you add a new database replica, you no longer need to wait for the stream tables to rebuild from scratch (which could take minutes or hours for large tables). Copy the snapshot, restore it, and the replica's stream tables are immediately current within milliseconds. `pgtrickle.list_snapshots(name)` and `pgtrickle.drop_snapshot(table)` manage the snapshot lifecycle. --- ## Predictive Maintenance Window Planner With months of refresh history accumulated from the cost model (v0.22.0 onwards), pg_trickle can now turn that history into recommendations: **`pgtrickle.recommend_schedule(name)`** analyses the stream table's refresh performance history and returns: - A recommended refresh interval (shorter if the current one is too long for the observed latency, longer if it is unnecessarily tight) - A suggested cron expression for off-peak scheduling - A confidence score (0–1 based on how much history is available) **`pgtrickle.schedule_recommendations()`** returns one row per stream table, sorted by how far the current schedule deviates from the recommendation — making it easy to find the most mis-configured stream tables at a glance. **Spike-forecast alerts** — when the cost model predicts the next refresh will breach the stream table's SLA by more than 20%, a `pg_trickle_alert predicted_sla_breach` notification is sent, with a debounce to avoid alert storms. --- ## Cluster-Wide Worker Observability `pgtrickle.cluster_worker_summary()` reads from shared memory and returns one row per database in the cluster — worker count, queue depth, quota, and utilisation percentage — accessible from any database connection without cross-database SPI. All Prometheus metrics now carry `db_oid` and `db_name` labels, enabling per-database panels in Grafana dashboards across a multi-database cluster. A new `docs/integrations/multi-tenant.md` guide covers recommended worker quota allocation and Grafana configuration for multi-database deployments. --- ## OpenMetrics Conformance The Prometheus metrics endpoint introduced in v0.21.0 had not been formally validated against the OpenMetrics specification. A conformance test now parses the `/metrics` output and fails if any format violations are found. Port-conflict and timeout errors from the metrics server are now typed (`MetricsServerError::PortInUse`, `MetricsServerError::Timeout`) rather than bare panics. Malformed HTTP requests to the metrics endpoint return a `400 Bad Request` response instead of crashing. `pgtrickle.metrics_summary()` provides a cross-database aggregate view of key counters, suitable for a cluster-overview Grafana dashboard. --- ## pgrx 0.18 Upgrade The pgrx library (the framework that pg_trickle uses to interact with PostgreSQL internals) was upgraded from 0.17 to 0.18. This brings updated SPI interfaces, improved proc-macro support, and compatibility with the latest PostgreSQL 18 API changes. --- ## Scope v0.27.0 is the final pre-1.0 operability release. The snapshot/restore API solves a real operational pain point for replica bootstrapping. The schedule planner turns accumulated data into actionable recommendations. Cluster-wide observability and OpenMetrics conformance round out the production-readiness story ahead of the stable v1.0 release.