# Integration & Unit Test Suite — Deep Evaluation Report > **Date:** 2025-03-16 > **Scope:** 10 integration test files (~123 tests, ~3,700 lines) + > unit tests across `src/` (~1,300 tests) > **Goal:** Assess coverage confidence and identify mitigations to harden > the non-E2E test layers --- ## Implementation Status > **Updated:** 2026-03-17 (pass 3) > **Status:** All items complete. All P0–P3 items implemented or intentionally resolved. | Priority | Item | Status | Notes | |----------|------|--------|-------| | P0-1 | Extract Multiset Comparison Helper | ✅ DONE | `assert_sets_equal()` added to `tests/common/mod.rs` | | P0-2 | Add LISTEN/NOTIFY Round-Trip Test | ✅ DONE | Real `PgListener` round-trip in `monitoring_tests.rs`, replaces fire-and-forget | | P0-3 | Strengthen Workflow Data Validation | ✅ DONE | `test_full_refresh_workflow` now uses `assert_sets_equal` for row-level correctness | | P0-4 | Add Semi/Anti Join Unit Tests | ✅ DONE | Expanded from 3→9 tests each; nested, multi-column, filter, complementary | | P1-1 | Widen Staleness Test Tolerance | ✅ DONE | Changed 59–65s window to 50–120s in `test_staleness_calculation` | | P1-2 | Add Column Type Verification | ✅ DONE | Two new tests in `extension_tests.rs` verify column types via `information_schema` | | P1-3 | Add Error Escalation Threshold Test | ✅ DONE | `test_error_escalation_exact_threshold` + `test_suspended_to_active_recovery` added to `resilience_tests.rs` | | P2-1 | DDL Drift Detection Test | ✅ DONE | `test_catalog_ddl_no_phantom_columns` + `test_catalog_ddl_no_phantom_tables` in `catalog_compat_tests.rs`; pure-Rust via `include_str!` | | P2-2 | Multi-Table Join Chain Unit Tests | ✅ DONE | `(A LEFT JOIN B) INNER JOIN C`, `A INNER JOIN (B INNER JOIN C)` in `join.rs`; `A FULL JOIN (B SEMI JOIN C)`, `(A FULL JOIN B) INNER JOIN C` in `full_join.rs` | | P2-3 | Scheduler Job Lifecycle Integration Test | ✅ DONE | `pgt_scheduler_jobs` added to `CATALOG_DDL`; `test_scheduler_job_lifecycle_queued_to_succeeded` + `test_scheduler_job_lifecycle_retryable_failure` in `workflow_tests.rs` | | P2-4 | Extend Property Tests to DVM Operators | ✅ DONE | `proptest!` blocks added to `join.rs`, `semi_join.rs`, `anti_join.rs`; 6 new property tests: UNION ALL invariant, output-col-count, left-col-equality, EXISTS/NOT EXISTS presence | | P3-1 | Remove or Fold Smoke Tests | ✅ DONE | Kept as first-to-fail diagnostics; updated module doc to explain the intentional redundancy and diagnostic value; test names standardized | | P3-2 | Add Workflow Test for ST Drop Cascade | ✅ DONE | `test_workflow_st_drop_cascade` in `workflow_tests.rs`; verifies CASCADE on deps, storage table drop | | P3-3 | Standardize Test Naming | ✅ DONE | `workflow_tests.rs` older tests renamed to `test_workflow_*` convention; `smoke_tests.rs` renamed to `test_infra_*` convention | --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [Test Infrastructure](#test-infrastructure) 3. [Integration Tests — Per-File Analysis](#integration-tests--per-file-analysis) 4. [Unit Tests — Module Landscape](#unit-tests--module-landscape) 5. [Cross-Cutting Findings](#cross-cutting-findings) 6. [Priority Mitigations](#priority-mitigations) --- ## Executive Summary The integration test suite consists of **~123 test functions** across 10 files, running against a Testcontainers PostgreSQL 18.3-alpine instance via sqlx. Unit tests provide **~1,300 tests** across `src/` modules with no database dependency. **Integration confidence: MODERATE (≈55%)** **Unit test confidence: HIGH (≈80%)** ### Integration Test Strengths | Area | Assessment | |------|-----------| | Property-based logic tests | **Excellent** — `property_tests.rs` uses proptest with 500+ cases per property | | Scenario multiset validation | **Excellent** — `scenario_tests.rs` uses `EXCEPT ALL / UNION ALL` pattern for 7/10 tests | | Trigger detection regression | **Strong** — 10 tests covering all exclusion rules + recent prefix bug regression | | Catalog constraint enforcement | **Good** — UNIQUE, CHECK constraints validated in `catalog_tests.rs` | ### Integration Test Weaknesses | Severity | Finding | Impact | |----------|---------|--------| | **CRITICAL** | Extension binary never loaded — all tests use mock catalog DDL | Tests validate SQL assumptions about the catalog schema, not the actual extension | | **HIGH** | 63% of tests (77/123) use weak assertions (boolean/count/existence only) | Many tests verify structure but not data correctness | | **HIGH** | Only `scenario_tests.rs` has multiset comparison (7 tests) | Other files that mutate data never verify row-level correctness | | **HIGH** | All resilience/workflow/monitoring tests use mock data | No test triggers real refresh failures, real CDC, or real crashes | | **MEDIUM** | NOTIFY test (`monitoring_tests.rs`) never receives the payload | Only sends `pg_notify()`, never verifies receipt via `LISTEN` | | **MEDIUM** | Time-dependent staleness assertions vulnerable to CI clock skew | `test_staleness_calculation` asserts 59–65 seconds for a 60-second-old row | | **LOW** | `smoke_tests.rs` duplicates basic Testcontainers validation | Already proven by any other test that starts a container | ### Unit Test Strengths | Area | Assessment | |------|-----------| | Query parsing | **Exceptional** — `parser.rs` has 60+ tests with thorough edge cases | | DVM operators | **Strong** — aggregate (40), recursive CTE (45), inner join (11), left join (11), full join (7) | | Schedule parsing | **Comprehensive** — `api.rs` ~95 tests covering cron/duration/at/repeats variants | | Refresh decision logic | **Solid** — `determine_refresh_action()` covers all 5 state combinations | ### Unit Test Weaknesses | Severity | Finding | Impact | |----------|---------|--------| | **HIGH** | Semi/Anti join operators severely under-tested (3 tests each) | Complex join types have minimal coverage vs INNER/LEFT (11 each) | | **HIGH** | `execute_differential_refresh()` only tested for error paths | Success case (MERGE execution) completely untested at unit level | | **MEDIUM** | Scheduler job lifecycle untestable without SPI mock | enqueue/claim/complete flow has no unit coverage | | **MEDIUM** | No multi-table join chains tested | All join tests are 2-table; no `(A ⋈ B) ⋈ C` compositions | | **LOW** | Distinct operator has only 3 tests | Minimal but likely sufficient for simple dedup logic | --- ## Test Infrastructure ### Shared Helpers — `tests/common/mod.rs` The integration test harness is built around `TestDb`, a thin wrapper over `sqlx::PgPool` and a Testcontainers `postgres:18.3-alpine` container. **Two initialization modes:** | Mode | Method | What You Get | |------|--------|-------------| | Bare | `TestDb::new()` | Fresh PostgreSQL — no pg_trickle objects at all | | Catalog | `TestDb::with_catalog()` | Bare PG + hardcoded `CATALOG_DDL` SQL (schemas, tables, views, functions) | **Key helper methods:** - `execute(sql)` — run SQL, panic on error - `try_execute(sql)` → `Result` — run SQL, return result - `query_scalar::(sql)` — single-value query, panic on empty - `query_scalar_opt::(sql)` → `Option` — single-value, return None if empty - `count(table)` → `i64` — `SELECT COUNT(*) FROM table` **Critical limitation:** `TestDb::with_catalog()` loads a hardcoded DDL string that mimics the extension's catalog tables (schemas `pgtrickle`, `pgtrickle_changes`; tables like `pgt_stream_tables`, `pgt_dependencies`, etc.) but does **not** load the compiled extension. This means: - ❌ No SQL-callable functions (`pgtrickle.create_stream_table()`, etc.) - ❌ No background worker - ❌ No CDC triggers - ❌ No GUC variables - ❌ No event triggers Integration tests therefore validate **catalog schema correctness** and **pure Rust logic**, not the extension's runtime behavior. Runtime behavior is the domain of E2E tests. **Missing from harness:** - No `assert_st_matches_query()` helper (unlike the E2E harness) - No `PgListener`-based NOTIFY verification - No multiset comparison helper — `scenario_tests.rs` reimplements this inline --- ## Integration Tests — Per-File Analysis ### 1. `smoke_tests.rs` — 3 tests, 49 lines **Purpose:** Validates that the Testcontainers infrastructure works. | Test | What It Does | Assertion Quality | |------|-------------|-------------------| | `test_container_starts_and_connects` | `SELECT version()` contains "PostgreSQL" | ✅ Adequate (infra) | | `test_create_table_and_insert` | Creates table, inserts 2 rows, counts | ✅ Adequate (infra) | | `test_schemas_can_be_created` | Creates schema, queries pg_namespace | ✅ Adequate (infra) | **Verdict:** ✅ **Pass.** These are infrastructure sanity checks and serve their purpose. However, they are redundant once any other test succeeds — consider whether keeping them has value beyond documentation. --- ### 2. `catalog_tests.rs` — 16 tests, 514 lines **Purpose:** Validates the mock catalog DDL: schemas exist, tables have correct constraints, CRUD operations work, dependencies cascade, change buffer tables have the right structure. **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | What It Checks | Verdict | |------|---------------|---------------|---------| | `test_catalog_schemas_created` | Schemas exist | `EXISTS(pg_namespace)` for 2 schemas | ⚠️ Weak (existence only) | | `test_catalog_tables_exist` | Tables exist | Loop + `EXISTS(pg_class)` for 5 tables | ⚠️ Weak | | `test_stream_tables_info_view_exists` | View exists | `EXISTS(pg_class WHERE relkind='v')` | ⚠️ Weak | | `test_insert_stream_table` | Can INSERT an ST row | COUNT=1, single-field string check | ⚠️ Weak (doesn't verify all columns) | | `test_unique_name_constraint` | UNIQUE(pgt_name, pgt_schema) enforced | Duplicate insert returns error | ✅ Pass | | `test_refresh_mode_check_constraint` | CHECK on refresh_mode | Invalid value rejected | ✅ Pass | | `test_status_check_constraint` | CHECK on status | Invalid value rejected | ✅ Pass | | `test_dependency_insertion_and_cascade_delete` | CASCADE DELETE | Insert dep, delete parent, count=0 | ⚠️ Weak | | `test_refresh_history_recording` | Refresh history tracked | INSERT + UPDATE `rows_inserted=42` | ⚠️ Weak | | `test_refresh_history_action_check_constraint` | CHECK on action | Invalid action rejected | ✅ Pass | | `test_stream_tables_info_view` | Staleness computed | `stale=true` for old data | ⚠️ Weak | | `test_change_tracking_crud` | CRUD on tracking table | Array append + equality | ⚠️ Weak | | `test_status_transitions` | Status updates work | UPDATE loop + COUNT | ⚠️ Weak | | `test_change_buffer_table_schema` | Buffer table creation | Creates table, inserts 1 row | ⚠️ Weak (doesn't validate schema matches CDC output) | | `test_multiple_sts_sharing_source` | Multiple STs can share a source | Creates 2 STs, counts deps | ⚠️ Weak | | `test_staleness_info_view` | Staleness calculation | Duration extraction from age | ⚠️ Weak | **Scorecard:** 3/16 strong, 13/16 weak. **Key gaps:** - No multiset comparison — INSERT tests verify count or a single field, never the full row - Cascade delete only checks count, not that the correct rows were deleted - Change buffer schema test doesn't validate that the schema matches what the actual CDC trigger would produce - No test for cross-schema name uniqueness edge cases --- ### 3. `catalog_compat_tests.rs` — 21 tests, 477 lines **Purpose:** Pins PostgreSQL 18.x catalog behavior assumptions that pg_trickle relies on (type casting, relkind values, advisory locks, etc.). **Init mode:** `TestDb::new()` (no catalog needed) | Test Group | # Tests | What It Validates | Verdict | |-----------|---------|-------------------|---------| | `pg_get_viewdef_*` | 3 | View definition has/lacks semicolon, subquery wrapping | ✅ Adequate | | `nspname_*` | 3 | `nspname::text` cast works (Oid 19 → 25 interop) | ✅ Important regression guard | | `relkind_*` | 5 | relkind values for r/v/m/p/I are as expected | ✅ Pass | | `array_length_*` | 2 | `array_length()` returns INT4 (not INT8) | ✅ Pass | | `union_all_*` | 2 | UNION ALL preserves column names, allows aggregation | ⚠️ Weak (count only) | | `advisory_lock_roundtrip` | 1 | Lock acquire + release cycle | ✅ Pass | | `pg_available_extensions_shape` | 1 | Extension metadata columns exist | ✅ Pass | | `trigger_*` | 2 | User trigger detection query works | ✅ Pass | | `table_rewrite_event_trigger` | 1 | Event trigger fires on ALTER TABLE | ✅ Pass | | `pg_get_viewdef_trailing_whitespace` | 1 | Trailing whitespace behavior | ✅ Pass | **Verdict:** ✅ **Good.** These tests are valuable as PostgreSQL version compatibility guards. When PG 19 ships, any behavioral changes that break pg_trickle assumptions will surface here. The `nspname` cast tests are especially important given the Oid 19/25 mismatch that caused real bugs. --- ### 4. `extension_tests.rs` — 11 tests, 236 lines **Purpose:** Validates pg_trickle-specific catalog objects exist with the expected structure (columns, indexes, JSONB frontier, pg_lsn type). **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | What It Checks | Verdict | |------|---------------|---------------|---------| | `test_pg_trickle_schema_exists` | Schema exists | `EXISTS` check | ⚠️ Weak | | `test_all_catalog_objects_exist` | All catalog objects exist | Loop + COUNT ≥ 3 indexes | ⚠️ Weak | | `test_xxhash_deterministic` | xxh64 hash is deterministic | Pure Rust: `hash(a)==hash(a)`, `hash(a)!=hash(b)` | ✅ Strong | | `test_pg_hashtext_works_in_container` | PG hashtext works | SQL: two calls equal | ✅ Pass | | `test_stream_tables_columns` | All expected columns exist | Loop + `EXISTS` per column | ⚠️ Weak (doesn't verify types) | | `test_change_tracking_columns` | Tracking columns exist | Loop + `EXISTS` | ⚠️ Weak | | `test_dependency_columns` | Dependency columns exist | Loop + `EXISTS` | ⚠️ Weak | | `test_refresh_history_columns` | History columns exist | Loop + `EXISTS` | ⚠️ Weak | | `test_frontier_jsonb_column` | JSONB roundtrip works | INSERT + UPDATE + field extraction | ✅ Pass | | `test_pg_lsn_type_works` | pg_lsn comparison works | Cast + comparison operators | ✅ Pass | | `test_xxhash_sql_function` | xxh64 SQL function deterministic | SQL calls + equality | ✅ Pass | **Key gap:** Column existence tests don't verify data types or constraints. A column named `pgt_status` of type `integer` instead of `text` would pass. --- ### 5. `monitoring_tests.rs` — 10 tests, 421 lines **Purpose:** Validates monitoring queries (refresh stats aggregation, staleness calculation, notification, lateral join stats views). **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | What It Checks | Verdict | |------|---------------|---------------|---------| | `test_refresh_stats_aggregation` | Stats aggregate correctly | FILTER/COUNT on mock data | ⚠️ Weak (pieces tested separately) | | `test_refresh_history_by_name` | History ordered correctly | ORDER BY + LIMIT | ✅ Pass | | `test_staleness_calculation` | Staleness epoch correct | Assert 59–65 seconds for 60s old data | ⚠️ Fragile (clock-dependent) | | `test_stale_flag` | Stale detection works | Boolean: 120s > 30s threshold | ✅ Pass | | `test_stream_tables_info_view` | Info view returns data | COUNT=1 + stale field | ⚠️ Weak | | `test_notify_pg_trickle_alert` | NOTIFY works | Executes `pg_notify()` — never verifies receipt | ⚠️ Incomplete | | `test_full_stats_lateral_join` | LATERAL JOIN stats work | 11-column tuple unpacking | ✅ Strong | | `test_monitoring_empty_state` | Empty tables handled | COUNT=0 queries | ✅ Pass | | `test_alert_channels_exist` | Alert channels documented | String contains check | ⚠️ Weak | | `test_staleness_with_null_timestamp` | NULL data_timestamp handled | COALESCE behavior | ✅ Pass | **Key gaps:** - `test_notify_pg_trickle_alert` is a fire-and-forget test — it never subscribes with `LISTEN` to verify the payload was delivered - `test_staleness_calculation` uses a ±5 second tolerance window that may fail under CI load --- ### 6. `property_tests.rs` — ~30 tests, 470 lines **Purpose:** Property-based testing of pure Rust logic using `proptest`. **Init mode:** None (no database) | Property Category | # Tests | Key Properties Verified | |-------------------|---------|----------------------| | LSN comparison | 6 | Reflexive, antisymmetric, trichotomy, total order, ≥ consistent | | Frontier JSON | 4 | Roundtrip serialization, set/get idempotency | | Canonical period | 2 | 48 × 2^n bounds invariant | | SQL generation | 5 | `quote_ident` roundtrip, column list completeness, determinism | | Enum roundtrip | 2 | PgtStatus + RefreshMode from_str(to_str(x)) == x | | DAG topology | 8 | Topological order respects edges, cycle detection, schedule resolution | | Hash properties | 3 | Determinism, NULL encoding distinction, separator uniqueness | **Verdict:** ✅ **Excellent.** This is the gold standard test file. All tests use proptest with 500+ randomly generated cases per property. No gaps identified. The DAG property tests are particularly valuable — they verify that topological sort always produces a valid ordering, even for randomly generated graphs. --- ### 7. `resilience_tests.rs` — 7 tests, 338 lines **Purpose:** Validates crash recovery, error escalation, advisory locks, and failure isolation between stream tables. **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | What It Checks | Verdict | |------|---------------|---------------|---------| | `test_crash_recovery_marks_running_as_failed` | RUNNING → FAILED on restart | Status + error_message + end_time set | ✅ Pass | | `test_advisory_lock_mechanism` | Advisory locks work | acquire + release cycle | ✅ Pass | | `test_error_escalation_to_suspension` | Errors increment → suspend | UPDATE loop + auto-suspend check | ⚠️ Weak (doesn't verify threshold) | | `test_error_count_resets_on_success` | Errors reset on success | `consecutive_errors=0` after reset | ⚠️ Weak | | `test_needs_reinit_lifecycle` | reinit flag toggles | UPDATE TRUE/FALSE cycle | ⚠️ Weak | | `test_refresh_history_status_transitions` | RUNNING → COMPLETED | UPDATE with rows_inserted=42 | ⚠️ Weak | | `test_error_handling_independent_per_st` | Errors isolated per ST | Suspend one, verify other ACTIVE | ✅ Pass | **Key gaps:** - Error escalation test doesn't verify the exact threshold (how many errors trigger suspension?) - No test for SUSPENDED → ACTIVE recovery path - All tests use mock SQL updates — no test triggers a real refresh failure --- ### 8. `scenario_tests.rs` — 10 tests, 489 lines **Purpose:** End-to-end-like scenarios testing full refresh workflows with multiset data validation. **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | Assert Type | Verdict | |------|---------------|------------|---------| | `test_scenario_create_and_full_refresh` | ST = query after refresh | EXCEPT/UNION ALL multiset | ✅✅ Strong | | `test_scenario_refresh_after_insert` | ST matches after INSERT | EXCEPT/UNION ALL | ✅✅ Strong | | `test_scenario_refresh_after_update` | ST matches after UPDATE | EXCEPT/UNION ALL + value check | ✅✅ Strong | | `test_scenario_refresh_after_delete` | ST matches after DELETE | EXCEPT/UNION ALL | ✅✅ Strong | | `test_scenario_filtered_st` | Filtered ST matches WHERE clause | EXCEPT/UNION ALL | ✅✅ Strong | | `test_scenario_join_st` | JOIN ST is correct | EXCEPT/UNION ALL | ✅✅ Strong | | `test_scenario_aggregate_st` | Aggregate ST correct | EXCEPT/UNION ALL + specific values | ✅✅ Strong+ | | `test_scenario_refresh_history` | History tracked | COUNT queries | ⚠️ Weak | | `test_scenario_st_info_view` | Info view correct | Boolean check | ⚠️ Weak | | `test_scenario_no_data_refresh` | NO_DATA action recorded | String + count | ⚠️ Weak | **Verdict:** ✅ **Excellent for tests 1–7.** The multiset comparison pattern is the gold standard: ```sql SELECT NOT EXISTS ( (SELECT cols FROM st_table EXCEPT SELECT cols FROM defining_query) UNION ALL (SELECT cols FROM defining_query EXCEPT SELECT cols FROM st_table) ) ``` This catches: missing rows, extra rows, duplicate discrepancies, and column value mutations. Tests 1–7 are the only integration tests that validate actual data correctness. **Note:** These tests simulate a full refresh by manually copying data (`INSERT INTO st_table SELECT ...`) — they do not invoke the extension's refresh function. This is appropriate for the integration tier. --- ### 9. `trigger_detection_tests.rs` — 10 tests, 378 lines **Purpose:** Validates the SQL query used by `src/cdc.rs` to detect user-created triggers (excluding pg_trickle's own `pgt_*` and `pg_trickle_*` triggers). **Init mode:** `TestDb::new()` | Test | Scenario | Verdict | |------|----------|---------| | `test_trigger_detection_no_triggers` | No triggers → false | ✅ Pass | | `test_trigger_detection_with_user_trigger` | User ROW trigger → true | ✅ Pass | | `test_trigger_detection_ignores_pgt_prefix` | `pgt_cdc_*` → false | ✅ Pass | | `test_trigger_detection_ignores_statement_level` | STATEMENT trigger → false | ✅ Pass | | `test_trigger_detection_mixed_triggers` | Mix of excluded + 1 user → true | ✅ Pass | | `test_trigger_detection_before_trigger` | BEFORE trigger detected | ✅ Pass | | `test_trigger_detection_after_drop` | Drop trigger → state updates | ✅ Pass | | `test_trigger_detection_ignores_pg_trickle_prefix` | `pg_trickle_cdc_*` → false | ✅ Regression guard | | `test_trigger_detection_mixed_internal_and_user` | Both prefixes + user → true | ✅ Pass | | `test_trigger_detection_constraint_trigger` | Constraint trigger handling | ✅ Pass | **Verdict:** ✅ **Strong.** 10 tests cover all exclusion rules and include a regression test for the `pg_trickle_` prefix bug. Only weakness is that all assertions are boolean — but for trigger detection, boolean is the correct abstraction level. --- ### 10. `workflow_tests.rs` — 5 tests, 326 lines **Purpose:** Validates full lifecycle workflows: create ST → insert source data → refresh → verify history → drop. **Init mode:** `TestDb::with_catalog()` | Test | What It Claims | What It Checks | Verdict | |------|---------------|---------------|---------| | `test_full_refresh_workflow` | Full lifecycle | COUNT queries for status + history | ⚠️ Weak | | `test_source_data_changes_tracked` | CDC mock: I/U/D in buffer | Change sequence + LSN order | ⚠️ Weak (mock only) | | `test_chained_stream_tables` | Dependency DAG | Creates 2 STs in chain, queries graph | ⚠️ Weak | | `test_error_escalation_and_suspension` | Error loop + suspension | 3 failures → suspend, verify history | ⚠️ Weak | | `test_full_lifecycle_with_drop` | Create → populate → drop | Table exists → table gone | ⚠️ Weak | **Key gap:** `test_full_refresh_workflow` creates a source table, inserts data, simulates a "refresh" by copying data, marks the ST as populated — but **never verifies that the ST storage table contains the correct rows**. It only checks that `is_populated=true` and that a history record was created. This test should adopt the `scenario_tests.rs` EXCEPT/UNION ALL pattern. --- ## Unit Tests — Module Landscape ### Overview ~1,300 tests across `src/` modules. Here are the top modules by test count and coverage quality: ### Tier 1: Exceptional Coverage (>50 tests) | Module | Tests | Coverage | Notes | |--------|-------|----------|-------| | `api.rs` | ~95 | ⭐⭐⭐⭐ | Schedule parsing exhaustive (cron/duration/at/repeats). Query validation thorough. All pre-DB logic. | | `dvm/parser.rs` | ~60 | ⭐⭐⭐⭐⭐ | Aggregate detection, window function extraction, set operation analysis, subquery handling. Gold standard for parse edge cases. | | `refresh.rs` | ~50 | ⭐⭐⭐ | `determine_refresh_action()` covers all 5 state combos. `resolve_lsn_placeholders()` thorough. **Gap:** `execute_differential_refresh()` — only error paths tested, success path untested. | | `dvm/operators/recursive_cte.rs` | ~45 | ⭐⭐⭐⭐ | Multi-level recursion, terminal detection, max cycle limits. | | `dvm/operators/aggregate.rs` | ~40 | ⭐⭐⭐⭐⭐ | GROUP BY dedup, DISTINCT COUNT, window post-filter extraction. Comprehensive per-type coverage. | ### Tier 2: Solid Coverage (15–50 tests) | Module | Tests | Coverage | Notes | |--------|-------|----------|-------| | `hooks.rs` | ~23 | ⭐⭐⭐ | DDL classification (12 types), schema change detection. | | `ivm.rs` | ~24 | ⭐⭐⭐ | Simple chain detection, insert/delete SQL generation. | | `scheduler.rs` | ~20 | ⭐⭐ | Group policies, falling_behind ratios, worker_extra parsing. **Gap:** No job lifecycle (enqueue/claim/complete). | | `cdc.rs` | ~20 | ⭐⭐⭐ | Trigger naming, PK hash, bitmask generation, LSN range parsing. | | `version.rs` | ~19 | ⭐⭐⭐ | Frontier serialization, LSN merge, data timestamp selection. | | `catalog.rs` | ~16 | ⭐⭐⭐ | Enum serialization roundtrips. | ### Tier 3: Basic or Minimal Coverage (<15 tests) | Module | Tests | Coverage | Notes | |--------|-------|----------|-------| | `dvm/operators/join.rs` | ~11 | ⭐⭐⭐ | INNER join only — equijoin keys, pre-change snapshot, Part 1a/1b splits. | | `dvm/operators/outer_join.rs` | ~11 | ⭐⭐⭐ | LEFT join — nested joins, null padding, R₀ reconstruction. | | `dvm/operators/scan.rs` | ~10 | ⭐⭐⭐ | Partition elimination, type checking. | | `dvm/operators/window.rs` | ~8 | ⭐⭐⭐ | Partition/frame detection. | | `dvm/operators/lateral_subquery.rs` | ~8 | ⭐⭐ | Basic correlation tests only. | | `dvm/operators/filter.rs` | ~8 | ⭐⭐ | Basic pushdown. | | `dvm/operators/full_join.rs` | ~7 | ⭐⭐ | 10-part decomposition, delta flags. | | `dag.rs` | ~8 | ⭐⭐⭐ | Topo sort, cycle detection. Property tests in `property_tests.rs` supplement. | | `hash.rs` | ~7 | ⭐⭐⭐ | SHA256 determinism. Property tests supplement. | | `monitor.rs` | ~8 | ⭐⭐ | Envelope formatting only. | | `error.rs` | ~7 | ⭐⭐ | Display/Debug formatting only. | | `wal_decoder.rs` | ~5 | ⭐ | Stubs/placeholders. | ### Join Operator Coverage Matrix This is a critical coverage gap. Each join type has its own DVM operator file, but test coverage varies dramatically: | Join Type | File | Tests | Coverage | Notes | |-----------|------|-------|----------|-------| | **INNER** | `operators/join.rs` | 11 | ⭐⭐⭐ | Equijoin keys, EXCEPT ALL pre-change, Part 1a/1b | | **LEFT** | `operators/outer_join.rs` | 11 | ⭐⭐⭐ | Nested joins, null padding, 7-part decomposition | | **FULL** | `operators/full_join.rs` | 7 | ⭐⭐ | 10-part decomposition, delta flags, nested joins | | **SEMI** | `operators/semi_join.rs` | 3 | ⭐ | **CRITICAL GAP:** Basic EXISTS check only | | **ANTI** | `operators/anti_join.rs` | 3 | ⭐ | **CRITICAL GAP:** Basic NOT EXISTS check only | **Missing across all join types:** - No multi-column non-equi-join conditions - No 3+ table join chains - No unified test comparing behavior across join types with identical data --- ## Cross-Cutting Findings ### Finding 1: Mock Catalog DDL ≠ Real Extension (CRITICAL) The most fundamental limitation of the integration test layer is that `TestDb::with_catalog()` loads a hardcoded DDL string, not the actual compiled extension. This means: - The DDL string can drift from the real extension's DDL without detection - Constraint names, column types, default values, and index definitions are assumed correct but never verified against the actual `CREATE EXTENSION` - Any SQL function called in tests is not the real extension function — it's a mock or doesn't exist at all **Mitigation:** This is an architectural choice (integration tests should be fast and DB-binary-free). The risk is mitigated by E2E tests that load the real extension. However, a **DDL drift detection test** would be valuable — see Priority Mitigations. ### Finding 2: Assertion Quality Distribution | Quality Level | Count | % | Description | |--------------|-------|---|-------------| | ✅✅ Strong (multiset) | 7 | 6% | `scenario_tests` EXCEPT/UNION ALL | | ✅ Adequate | 40 | 32% | Specific value checks, constraint enforcement, type verification | | ⚠️ Weak | 76 | 62% | Boolean/count/existence only | 62% of integration test assertions are weak. For catalog structure tests, weak assertions (existence checks) are often appropriate — you just need to know the table exists. But for data-flow tests (`workflow_tests.rs`, `monitoring_tests.rs`), weak assertions create false confidence. ### Finding 3: No Shared Multiset Comparison Helper `scenario_tests.rs` has excellent multiset comparison using inline SQL: ```sql SELECT NOT EXISTS ( (SELECT cols FROM a EXCEPT SELECT cols FROM b) UNION ALL (SELECT cols FROM b EXCEPT SELECT cols FROM a) ) ``` This pattern is reimplemented ad-hoc. Other files that would benefit from it (workflow_tests, catalog_tests with data validation) don't use it because there's no shared helper. ### Finding 4: Time-Dependent Assertions `monitoring_tests.rs::test_staleness_calculation` inserts a row with `NOW() - INTERVAL '60 seconds'` as the timestamp, then asserts the calculated staleness is between 59 and 65 seconds. Under heavy CI load or container startup delays, this window may be too narrow. ### Finding 5: Property Tests Are the Strongest Layer `property_tests.rs` with its 30 proptest-driven tests provides the highest confidence-per-test-count ratio of any file in the suite. The combination of random input generation (500+ cases per property) with mathematical invariant checking (reflexivity, antisymmetry, trichotomy for LSN ordering) is a model for how other pure-logic modules should be tested. ### Finding 6: Unit Tests Miss SPI-Dependent Code Paths Several important code paths are untestable at the unit level because they require SPI (PostgreSQL's Server Programming Interface): | Code Path | Module | Why Untestable | |-----------|--------|---------------| | `execute_differential_refresh()` success | `refresh.rs` | Executes MERGE via SPI | | Job enqueue/claim/complete | `scheduler.rs` | Reads/writes catalog via SPI | | CDC trigger installation | `cdc.rs` | Creates triggers via SPI | | Extension object creation | `api.rs` | Calls `CREATE` via SPI | This is an inherent limitation — these paths can only be tested via E2E tests (or future SPI mocking infrastructure). --- ## Priority Mitigations ### P0 — Critical (Confidence Improvement) #### P0-1: Extract Multiset Comparison Helper Move the `scenario_tests.rs` EXCEPT/UNION ALL pattern into `tests/common/mod.rs` as a reusable method: ```rust impl TestDb { async fn assert_sets_equal(&self, table_a: &str, table_b: &str, cols: &[&str]) { let col_list = cols.join(", "); let sql = format!( "SELECT NOT EXISTS ( (SELECT {cols} FROM {a} EXCEPT SELECT {cols} FROM {b}) UNION ALL (SELECT {cols} FROM {b} EXCEPT SELECT {cols} FROM {a}) )", cols = col_list, a = table_a, b = table_b ); let matches: bool = self.query_scalar(&sql).await; assert!(matches, "Set mismatch between {} and {}", table_a, table_b); } } ``` Then apply it to `workflow_tests.rs::test_full_refresh_workflow` and any other test that simulates refresh + data mutation. **Impact:** Converts 5+ weak tests to strong tests. #### P0-2: Add LISTEN/NOTIFY Round-Trip Test Replace the fire-and-forget `test_notify_pg_trickle_alert` with an actual round-trip test using `sqlx::PgListener`: ```rust #[tokio::test] async fn test_notify_roundtrip() { let db = TestDb::new().await; let mut listener = PgListener::connect_with(&db.pool).await.unwrap(); listener.listen("pg_trickle_alert").await.unwrap(); db.execute("SELECT pg_notify('pg_trickle_alert', '{\"event\": \"refresh\"}')").await; let notification = tokio::time::timeout( Duration::from_secs(5), listener.recv() ).await.unwrap().unwrap(); assert_eq!(notification.payload(), r#"{"event": "refresh"}"#); } ``` **Impact:** Converts an incomplete test to a real integration test. #### P0-3: Strengthen Workflow Data Validation `workflow_tests.rs::test_full_refresh_workflow` should verify that the storage table contains the expected rows after simulated refresh, not just check `is_populated=true`. Use the P0-1 helper. #### P0-4: Add Semi/Anti Join Unit Tests Increase `semi_join.rs` and `anti_join.rs` from 3 tests each to 8–10, matching the coverage level of `join.rs` and `outer_join.rs`. Key scenarios to add: - Nested semi-join (A SEMI JOIN (B SEMI JOIN C)) - Semi-join with multi-column conditions - Anti-join with NULL handling (NOT IN vs NOT EXISTS semantics) - Semi/anti join with aggregate subqueries ### P1 — High (Correctness Hardening) #### P1-1: Widen Staleness Test Tolerance Change `test_staleness_calculation` from ±5s (59–65) to ±10s (50–70) or use a relative tolerance: ```rust let staleness: f64 = db.query_scalar("...").await; assert!(staleness >= 50.0 && staleness <= 120.0, "Staleness {} out of range for 60s-old row", staleness); ``` **Impact:** Eliminates flaky CI failures from clock skew. #### P1-2: Add Column Type Verification to Extension Tests `extension_tests.rs` checks column existence but not types. Add type assertions: ```sql SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = 'pgtrickle' AND table_name = 'pgt_stream_tables' ORDER BY ordinal_position ``` Then compare against expected `(name, type)` pairs. **Impact:** Catches schema drift between mock DDL and expectations. #### P1-3: Add Error Escalation Threshold Test `resilience_tests.rs::test_error_escalation_to_suspension` should assert the exact threshold at which suspension triggers (e.g., after the Nth consecutive error), not just that suspension eventually happens. #### P1-4: Add Refresh Success Path Unit Test `refresh.rs` has ~50 unit tests but `execute_differential_refresh()` is only tested for error paths. While the success path requires SPI (and thus can only be fully tested in E2E), the pre-SPI decision logic (template selection, LSN resolution) could be extracted and tested. ### P2 — Medium (Coverage Expansion) #### P2-1: Add DDL Drift Detection Test Create a test that compares the hardcoded `CATALOG_DDL` string in `tests/common/mod.rs` against the extension's actual DDL (extracted from `sql/pg_trickle--X.Y.Z.sql` files): ```rust #[test] fn test_catalog_ddl_matches_extension_sql() { let catalog_ddl = CATALOG_DDL; let extension_sql = std::fs::read_to_string("sql/pg_trickle--current.sql").unwrap(); // Extract table definitions from both, compare column names and types // (approximate check — exact DDL match is too fragile) } ``` **Impact:** Prevents mock catalog from silently drifting from reality. #### P2-2: Add Multi-Table Join Chain Unit Tests No join operator test covers 3+ table compositions. Add tests for: - `(A INNER JOIN B) LEFT JOIN C` - `(A LEFT JOIN B) INNER JOIN C` (elimination opportunity) - `A FULL JOIN (B SEMI JOIN C)` #### P2-3: Add Scheduler Job Lifecycle Integration Test Even without SPI mocking, the integration harness can test the job lifecycle via raw SQL against the mock catalog: ```rust async fn test_scheduler_job_lifecycle() { let db = TestDb::with_catalog().await; // INSERT job → UPDATE status QUEUED→RUNNING → UPDATE RUNNING→COMPLETED // Verify state transitions, timing, and error counts } ``` #### P2-4: Extend Property Tests to DVM Operators The proptest pattern from `property_tests.rs` could be applied to DVM operator SQL output: - For any randomly generated 2-table join, verify that `diff_inner_join()` output SQL mentions all source columns - For any aggregate expression, verify output references all GROUP BY columns ### P3 — Low (Polish) #### P3-1: Remove or Fold Smoke Tests `smoke_tests.rs` is 49 lines testing Testcontainers basics already proven by every other test. Consider folding into a doc-comment example or removing. #### P3-2: Add Workflow Test for ST Drop Cascade `workflow_tests.rs::test_full_lifecycle_with_drop` should verify that dropping an ST also removes its change buffer table, dependencies, and history records. #### P3-3: Standardize Test Naming Some tests use `test___` naming (per AGENTS.md), others use `test_` without the component prefix. Standardize across all files. --- ## Appendix: Test Count Summary ### Integration Tests | File | Tests | Lines | Strong | Weak | Init Mode | |------|-------|-------|--------|------|-----------| | `smoke_tests.rs` | 3 | 49 | 0 | 3 | `new()` | | `catalog_tests.rs` | 16 | 514 | 3 | 13 | `with_catalog()` | | `catalog_compat_tests.rs` | 21 | 477 | 0 | 21 | `new()` | | `extension_tests.rs` | 11 | 236 | 3 | 8 | `with_catalog()` | | `monitoring_tests.rs` | 10 | 421 | 3 | 7 | `with_catalog()` | | `property_tests.rs` | ~30 | 470 | 30 | 0 | None | | `resilience_tests.rs` | 7 | 338 | 2 | 5 | `with_catalog()` | | `scenario_tests.rs` | 10 | 489 | 7 | 3 | `with_catalog()` | | `trigger_detection_tests.rs` | 10 | 378 | 0 | 10 | `new()` | | `workflow_tests.rs` | 5 | 326 | 0 | 5 | `with_catalog()` | | **Total** | **~123** | **~3,698** | **48** | **75** | — | ### Unit Tests (Top Modules) | Module | Tests | Quality | |--------|-------|---------| | `api.rs` | ~95 | ⭐⭐⭐⭐ | | `dvm/parser.rs` | ~60 | ⭐⭐⭐⭐⭐ | | `refresh.rs` | ~50 | ⭐⭐⭐ | | `dvm/operators/recursive_cte.rs` | ~45 | ⭐⭐⭐⭐ | | `dvm/operators/aggregate.rs` | ~40 | ⭐⭐⭐⭐⭐ | | `ivm.rs` | ~24 | ⭐⭐⭐ | | `hooks.rs` | ~23 | ⭐⭐⭐ | | `scheduler.rs` | ~20 | ⭐⭐ | | `cdc.rs` | ~20 | ⭐⭐⭐ | | `version.rs` | ~19 | ⭐⭐⭐ | | `catalog.rs` | ~16 | ⭐⭐⭐ | | Other modules | ~190 | Mixed | | **Total** | **~1,300** | — |