--- title: Verify Index Integrity description: Check BM25 indexes for corruption and structural issues canonical: https://docs.paradedb.com/documentation/indexing/verify-index --- ParadeDB provides `amcheck`-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes. These functions are useful for: - Proactive corruption detection before issues become critical - Validating index health after hardware failures or unexpected shutdowns - Verifying backup integrity - Debugging index-related issues ## Basic Verification The `pdb.verify_index` function performs structural integrity checks on a BM25 index: ```sql SQL SELECT * FROM pdb.verify_index('search_idx'); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute(diagnostics.verifyIndex("search_idx")); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index("search_idx") ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index(engine, "search_idx") ``` ```ruby Rails ParadeDB.paradedb_verify_index("search_idx") ``` ```cs EF Core await dbContext.Database.VerifyIndex("search_idx").ToListAsync(); ``` This returns a table with three columns: | Column | Type | Description | | ------------ | ------- | --------------------------------------------- | | `check_name` | text | Name of the verification check | | `passed` | boolean | Whether the check passed | | `details` | text | Additional information about the check result | ### Example Output ``` check_name | passed | details ----------------------------------------+--------+----------------------------------------------- search_idx: schema_valid | t | Index schema loaded successfully search_idx: index_readable | t | Index reader opened successfully search_idx: checksums_valid | t | All segment checksums validated successfully search_idx: segment_metadata_valid | t | 3 segments validated successfully ``` ## Heap Reference Validation To verify that all indexed entries still exist in the heap table, use the `heapallindexed` option: ```sql SQL SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, }), ); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index("search_idx", heapallindexed=True) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true ) ``` ```cs EF Core await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true } ).ToListAsync(); ``` This adds an additional check that validates every indexed `ctid` (tuple identifier) references a valid row in the table. This is particularly useful for detecting index entries that reference deleted or non-existent rows. The `heapallindexed` option can be slow on large indexes as it must verify every document. Consider using `sample_rate` for quick spot checks on large indexes. ## Options ### Sampling for Large Indexes For large indexes, you can check a random sample of documents instead of all documents: ```sql SQL -- Check 10% of documents SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, sample_rate := 0.1 ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, sampleRate: 0.1, }), ); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index( "search_idx", heapallindexed=True, sample_rate=0.1, ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, sample_rate=0.1, ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, sample_rate: 0.1 ) ``` ```cs EF Core await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, SampleRate = 0.1, } ).ToListAsync(); ``` ### Progress Reporting For long-running verifications, enable progress reporting to see status updates: ```sql SQL SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, reportProgress: true, }), ); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index( "search_idx", heapallindexed=True, report_progress=True, ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, report_progress=True, ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, report_progress: true ) ``` ```cs EF Core await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, ReportProgress = true, } ).ToListAsync(); ``` Progress messages are emitted via PostgreSQL's `NOTICE` channel. ### Verbose Mode For detailed logging including segment-by-segment progress and resume hints, enable verbose mode: ```sql SQL SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true, verbose := true ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, reportProgress: true, verbose: true, }), ); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index( "search_idx", heapallindexed=True, report_progress=True, verbose=True, ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, report_progress=True, verbose=True, ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, report_progress: true, verbose: true ) ``` ```cs EF Core await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, ReportProgress = true, Verbose = true, } ).ToListAsync(); ``` ### Stop on First Error To stop verification immediately when the first error is found (similar to `pg_amcheck --on-error-stop`): ```sql SQL SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute( diagnostics.verifyIndex("search_idx", { onErrorStop: true, }), ); ``` ```python Django from paradedb import paradedb_verify_index paradedb_verify_index("search_idx", on_error_stop=True) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", on_error_stop=True, ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", on_error_stop: true ) ``` ```cs EF Core await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { OnErrorStop = true } ).ToListAsync(); ``` ## Parallel Verification A single `verify_index` call processes segments sequentially within one PostgreSQL backend. For very large indexes, you can distribute verification across multiple database connections by specifying which segments each connection should check using the `segment_ids` parameter. This allows you to utilize multiple CPU cores by running verification in parallel processes. ### Listing Segments First, list all segments in the index: ```sql SQL SELECT * FROM pdb.index_segments('search_idx'); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute(diagnostics.indexSegments("search_idx")); ``` ```python Django from paradedb import paradedb_index_segments paradedb_index_segments("search_idx") ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_index_segments(engine, "search_idx") ``` ```ruby Rails ParadeDB.paradedb_index_segments("search_idx") ``` ```cs EF Core await dbContext.Database.IndexSegments("search_idx").ToListAsync(); ``` ``` partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc ----------------+-------------+------------+----------+-------------+--------- search_idx | 0 | b7e661af | 10000 | 0 | 10000 search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000 search_idx | 2 | 9894b412 | 10000 | 0 | 10000 search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000 ``` ### Verifying Specific Segments Then verify specific segments using the `segment_ids` parameter: ```sql SQL -- Worker 1: Verify even segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ARRAY[0, 2] ); -- Worker 2: Verify odd segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ARRAY[1, 3] ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; const segments = await db.execute(diagnostics.indexSegments("search_idx")); const evenSegments = segments .map((row) => row.segment_idx) .filter((segmentIdx) => segmentIdx % 2 === 0); const oddSegments = segments .map((row) => row.segment_idx) .filter((segmentIdx) => segmentIdx % 2 === 1); // Worker 1: Verify even segments await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, segmentIds: evenSegments, }), ); // Worker 2: Verify odd segments if (oddSegments.length) { await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, segmentIds: oddSegments, }), ); } ``` ```python Django from paradedb import paradedb_verify_index # Worker 1: Verify even segments paradedb_verify_index( "search_idx", heapallindexed=True, segment_ids=[0, 2], ) # Worker 2: Verify odd segments paradedb_verify_index( "search_idx", heapallindexed=True, segment_ids=[1, 3], ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics # Worker 1: Verify even segments diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, segment_ids=[0, 2], ) # Worker 2: Verify odd segments diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, segment_ids=[1, 3], ) ``` ```ruby Rails # Worker 1: Verify even segments ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, segment_ids: [0, 2] ) # Worker 2: Verify odd segments ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, segment_ids: [1, 3] ) ``` ```cs EF Core var segments = await dbContext.Database.IndexSegments("search_idx").ToListAsync(); var evenSegments = segments .Select(segment => segment.SegmentIdx) .Where(segmentIdx => segmentIdx % 2 == 0) .ToArray(); var oddSegments = segments .Select(segment => segment.SegmentIdx) .Where(segmentIdx => segmentIdx % 2 == 1) .ToArray(); await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, SegmentIds = evenSegments, } ).ToListAsync(); if (oddSegments.Length > 0) { await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, SegmentIds = oddSegments, } ).ToListAsync(); } ``` ### Automation Example Distribute verification across N workers: ```sql SQL -- Get segments for worker 0 (of 4 workers) SELECT array_agg(segment_idx) AS segments FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0; -- Run verification with those segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ( SELECT array_agg(segment_idx) FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0 ) ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; const segments = await db.execute(diagnostics.indexSegments("search_idx")); await db.execute( diagnostics.verifyIndex("search_idx", { heapAllIndexed: true, segmentIds: segments .map((row) => row.segment_idx) .filter((segmentIdx) => segmentIdx % 4 === 0), }), ); ``` ```python Django from paradedb import paradedb_index_segments, paradedb_verify_index paradedb_verify_index( "search_idx", heapallindexed=True, segment_ids=[ row["segment_idx"] for row in paradedb_index_segments("search_idx") if row["segment_idx"] % 4 == 0 ], ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_index( engine, "search_idx", heapallindexed=True, segment_ids=[ row["segment_idx"] for row in diagnostics.paradedb_index_segments(engine, "search_idx") if row["segment_idx"] % 4 == 0 ], ) ``` ```ruby Rails ParadeDB.paradedb_verify_index( "search_idx", heapallindexed: true, segment_ids: ParadeDB.paradedb_index_segments("search_idx").filter_map { |row| row["segment_idx"] if row["segment_idx"] % 4 == 0 }) ``` ```cs EF Core var segments = await dbContext.Database.IndexSegments("search_idx").ToListAsync(); await dbContext.Database.VerifyIndex( "search_idx", new VerifyIndexOptions { HeapAllIndexed = true, SegmentIds = segments .Select(segment => segment.SegmentIdx) .Where(segmentIdx => segmentIdx % 4 == 0) .ToArray(), } ).ToListAsync(); ``` ## Verifying All BM25 Indexes To verify all BM25 indexes in the database at once: ```sql SQL SELECT * FROM pdb.verify_all_indexes(); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute(diagnostics.verifyAllIndexes()); ``` ```python Django from paradedb import paradedb_verify_all_indexes paradedb_verify_all_indexes() ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_verify_all_indexes(engine) ``` ```ruby Rails ParadeDB.paradedb_verify_all_indexes() ``` ```cs EF Core await dbContext.Database.VerifyAllIndexes().ToListAsync(); ``` ### Filtering by Pattern Filter indexes by schema or name pattern (using SQL `LIKE` syntax): ```sql SQL -- Verify indexes in the 'public' schema only SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public'); -- Verify indexes matching a name pattern SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%'); -- Combine filters SELECT * FROM pdb.verify_all_indexes( schema_pattern := 'app_%', index_pattern := '%_idx', heapallindexed := true ); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; // Verify indexes in the 'public' schema only await db.execute( diagnostics.verifyAllIndexes({ schemaPattern: "public", }), ); // Verify indexes matching a name pattern await db.execute( diagnostics.verifyAllIndexes({ indexPattern: "search_%", }), ); // Combine filters await db.execute( diagnostics.verifyAllIndexes({ schemaPattern: "public", indexPattern: "search_%", heapAllIndexed: true, }), ); ``` ```python Django from paradedb import paradedb_verify_all_indexes # Verify indexes in the 'public' schema only paradedb_verify_all_indexes(schema_pattern="public") # Verify indexes matching a name pattern paradedb_verify_all_indexes(index_pattern="search_%") # Combine filters paradedb_verify_all_indexes( schema_pattern="app_%", index_pattern="%_idx", heapallindexed=True, ) ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics # Verify indexes in the 'public' schema only diagnostics.paradedb_verify_all_indexes( engine, schema_pattern="public", ) # Verify indexes matching a name pattern diagnostics.paradedb_verify_all_indexes( engine, index_pattern="search_%", ) # Combine filters diagnostics.paradedb_verify_all_indexes( engine, schema_pattern="app_%", index_pattern="%_idx", heapallindexed=True, ) ``` ```ruby Rails # Verify indexes in the 'public' schema only ParadeDB.paradedb_verify_all_indexes(schema_pattern: "public") # Verify indexes matching a name pattern ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_%") # Combine filters ParadeDB.paradedb_verify_all_indexes( schema_pattern: "app_%", index_pattern: "%_idx", heapallindexed: true ) ``` ```cs EF Core // Verify indexes in the 'public' schema only await dbContext.Database.VerifyAllIndexes( new VerifyAllIndexesOptions { SchemaPattern = "public" } ).ToListAsync(); // Verify indexes matching a name pattern await dbContext.Database.VerifyAllIndexes( new VerifyAllIndexesOptions { IndexPattern = "search_%" } ).ToListAsync(); // Combine filters await dbContext.Database.VerifyAllIndexes( new VerifyAllIndexesOptions { SchemaPattern = "public", IndexPattern = "search_%", HeapAllIndexed = true, } ).ToListAsync(); ``` ## Listing All BM25 Indexes To see all BM25 indexes in the database with summary statistics: ```sql SQL SELECT * FROM pdb.indexes(); ``` ```ts Drizzle import { diagnostics } from "@paradedb/drizzle-paradedb"; await db.execute(diagnostics.indexes()); ``` ```python Django from paradedb import paradedb_indexes paradedb_indexes() ``` ```python SQLAlchemy from paradedb.sqlalchemy import diagnostics diagnostics.paradedb_indexes(engine) ``` ```ruby Rails ParadeDB.paradedb_indexes() ``` ```cs EF Core await dbContext.Database .Indexes() .Select(index => new { index.SchemaName, index.TableName, index.IndexName, index.NumSegments, index.TotalDocs, }) .ToListAsync(); ``` ``` schemaname | tablename | indexname | indexrelid | num_segments | total_docs ------------+-------------+---------------+------------+--------------+------------ public | products | products_idx | 16421 | 3 | 50000 public | documents | documents_idx | 16435 | 5 | 125000 app | articles | articles_idx | 16448 | 2 | 10000 ``` ## Function Reference ### `pdb.verify_index` Verifies a single BM25 index. | Parameter | Type | Default | Description | | ----------------- | -------- | ---------- | ---------------------------------------------------------- | | `index` | regclass | (required) | The index to verify | | `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap | | `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all | | `report_progress` | boolean | `false` | Emit progress messages | | `verbose` | boolean | `false` | Emit detailed segment-level progress and resume hints | | `on_error_stop` | boolean | `false` | Stop on first error found | | `segment_ids` | int[] | `NULL` | Specific segment indices to check. NULL = all segments | ### `pdb.verify_all_indexes` Verifies all BM25 indexes in the database. | Parameter | Type | Default | Description | | ----------------- | ------- | ------- | ---------------------------------------------------------- | | `schema_pattern` | text | `NULL` | Filter by schema name (SQL LIKE pattern). NULL = all | | `index_pattern` | text | `NULL` | Filter by index name (SQL LIKE pattern). NULL = all | | `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap | | `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all | | `report_progress` | boolean | `false` | Emit progress messages | | `on_error_stop` | boolean | `false` | Stop on first error found | ### `pdb.index_segments` Lists all segments in a BM25 index. | Parameter | Type | Default | Description | | --------- | -------- | ---------- | -------------------- | | `index` | regclass | (required) | The index to inspect | Returns: | Column | Type | Description | | ---------------- | ------ | ------------------------------------------------ | | `partition_name` | text | Name of the index partition | | `segment_idx` | int | Segment index (use with `segment_ids` parameter) | | `segment_id` | text | Tantivy segment UUID | | `num_docs` | bigint | Number of live documents | | `num_deleted` | bigint | Number of deleted documents | | `max_doc` | bigint | Maximum document ID | ### `pdb.indexes` Lists all BM25 indexes in the database. Returns: | Column | Type | Description | | -------------- | ------ | ----------------------------------- | | `schemaname` | text | Schema containing the index | | `tablename` | text | Table the index is on | | `indexname` | text | Name of the index | | `indexrelid` | oid | OID of the index | | `num_segments` | int | Number of Tantivy segments | | `total_docs` | bigint | Total documents across all segments |