--- title: Verify Index Integrity description: Check BM25 indexes for corruption and structural issues canonical: https://docs.paradedb.com/documentation/indexing/verify-index --- ParadeDB provides `amcheck`-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes. These functions are useful for: - Proactive corruption detection before issues become critical - Validating index health after hardware failures or unexpected shutdowns - Verifying backup integrity - Debugging index-related issues ## Basic Verification The `pdb.verify_index` function performs structural integrity checks on a BM25 index: ```sql SELECT * FROM pdb.verify_index('search_idx'); ``` This returns a table with three columns: | Column | Type | Description | | ------------ | ------- | --------------------------------------------- | | `check_name` | text | Name of the verification check | | `passed` | boolean | Whether the check passed | | `details` | text | Additional information about the check result | ### Example Output ``` check_name | passed | details ----------------------------------------+--------+----------------------------------------------- search_idx: schema_valid | t | Index schema loaded successfully search_idx: index_readable | t | Index reader opened successfully search_idx: checksums_valid | t | All segment checksums validated successfully search_idx: segment_metadata_valid | t | 3 segments validated successfully ``` ## Heap Reference Validation To verify that all indexed entries still exist in the heap table, use the `heapallindexed` option: ```sql SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true); ``` This adds an additional check that validates every indexed `ctid` (tuple identifier) references a valid row in the table. This is particularly useful for detecting index entries that reference deleted or non-existent rows. The `heapallindexed` option can be slow on large indexes as it must verify every document. Consider using `sample_rate` for quick spot checks on large indexes. ## Options ### Sampling for Large Indexes For large indexes, you can check a random sample of documents instead of all documents: ```sql -- Check 10% of documents SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, sample_rate := 0.1 ); ``` ### Progress Reporting For long-running verifications, enable progress reporting to see status updates: ```sql SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true ); ``` Progress messages are emitted via PostgreSQL's `WARNING` channel. ### Verbose Mode For detailed logging including segment-by-segment progress and resume hints, enable verbose mode: ```sql SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true, verbose := true ); ``` ### Stop on First Error To stop verification immediately when the first error is found (similar to `pg_amcheck --on-error-stop`): ```sql SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true); ``` ## Parallel Verification A single `verify_index` call processes segments sequentially within one PostgreSQL backend. For very large indexes, you can distribute verification across multiple database connections by specifying which segments each connection should check using the `segment_ids` parameter. This allows you to utilize multiple CPU cores by running verification in parallel processes. ### Listing Segments First, list all segments in the index: ```sql SELECT * FROM pdb.index_segments('search_idx'); ``` ``` partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc ----------------+-------------+------------+----------+-------------+--------- search_idx | 0 | b7e661af | 10000 | 0 | 10000 search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000 search_idx | 2 | 9894b412 | 10000 | 0 | 10000 search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000 ``` ### Verifying Specific Segments Then verify specific segments using the `segment_ids` parameter: ```sql -- Worker 1: Verify even segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ARRAY[0, 2] ); -- Worker 2: Verify odd segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ARRAY[1, 3] ); ``` ### Automation Example Distribute verification across N workers: ```sql -- Get segments for worker 0 (of 4 workers) SELECT array_agg(segment_idx) AS segments FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0; -- Run verification with those segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ( SELECT array_agg(segment_idx) FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0 ) ); ``` ## Verifying All BM25 Indexes To verify all BM25 indexes in the database at once: ```sql SELECT * FROM pdb.verify_all_indexes(); ``` ### Filtering by Pattern Filter indexes by schema or name pattern (using SQL `LIKE` syntax): ```sql -- Verify indexes in the 'public' schema only SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public'); -- Verify indexes matching a name pattern SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%'); -- Combine filters SELECT * FROM pdb.verify_all_indexes( schema_pattern := 'app_%', index_pattern := '%_idx', heapallindexed := true ); ``` ## Listing All BM25 Indexes To see all BM25 indexes in the database with summary statistics: ```sql SELECT * FROM pdb.indexes(); ``` ``` schemaname | tablename | indexname | indexrelid | num_segments | total_docs ------------+-------------+---------------+------------+--------------+------------ public | products | products_idx | 16421 | 3 | 50000 public | documents | documents_idx | 16435 | 5 | 125000 app | articles | articles_idx | 16448 | 2 | 10000 ``` ## Function Reference ### `pdb.verify_index` Verifies a single BM25 index. | Parameter | Type | Default | Description | | ----------------- | -------- | ---------- | ---------------------------------------------------------- | | `index` | regclass | (required) | The index to verify | | `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap | | `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all | | `report_progress` | boolean | `false` | Emit progress messages | | `verbose` | boolean | `false` | Emit detailed segment-level progress and resume hints | | `on_error_stop` | boolean | `false` | Stop on first error found | | `segment_ids` | int[] | `NULL` | Specific segment indices to check. NULL = all segments | ### `pdb.verify_all_indexes` Verifies all BM25 indexes in the database. | Parameter | Type | Default | Description | | ----------------- | ------- | ------- | ---------------------------------------------------------- | | `schema_pattern` | text | `NULL` | Filter by schema name (SQL LIKE pattern). NULL = all | | `index_pattern` | text | `NULL` | Filter by index name (SQL LIKE pattern). NULL = all | | `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap | | `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all | | `report_progress` | boolean | `false` | Emit progress messages | | `on_error_stop` | boolean | `false` | Stop on first error found | ### `pdb.index_segments` Lists all segments in a BM25 index. | Parameter | Type | Default | Description | | --------- | -------- | ---------- | -------------------- | | `index` | regclass | (required) | The index to inspect | Returns: | Column | Type | Description | | ---------------- | ------ | ------------------------------------------------ | | `partition_name` | text | Name of the index partition | | `segment_idx` | int | Segment index (use with `segment_ids` parameter) | | `segment_id` | text | Tantivy segment UUID | | `num_docs` | bigint | Number of live documents | | `num_deleted` | bigint | Number of deleted documents | | `max_doc` | bigint | Maximum document ID | ### `pdb.indexes` Lists all BM25 indexes in the database. Returns: | Column | Type | Description | | -------------- | ------ | ----------------------------------- | | `schemaname` | text | Schema containing the index | | `tablename` | text | Table the index is on | | `indexname` | text | Name of the index | | `indexrelid` | oid | OID of the index | | `num_segments` | int | Number of Tantivy segments | | `total_docs` | bigint | Total documents across all segments |