---
title: Verify Index Integrity
description: Check BM25 indexes for corruption and structural issues
canonical: https://docs.paradedb.com/documentation/indexing/verify-index
---
ParadeDB provides `amcheck`-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes.
These functions are useful for:
- Proactive corruption detection before issues become critical
- Validating index health after hardware failures or unexpected shutdowns
- Verifying backup integrity
- Debugging index-related issues
## Basic Verification
The `pdb.verify_index` function performs structural integrity checks on a BM25 index:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx');
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx")
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(engine, "search_idx")
```
```ruby Rails
ParadeDB.paradedb_verify_index("search_idx")
```
This returns a table with three columns:
| Column | Type | Description |
| ------------ | ------- | --------------------------------------------- |
| `check_name` | text | Name of the verification check |
| `passed` | boolean | Whether the check passed |
| `details` | text | Additional information about the check result |
### Example Output
```
check_name | passed | details
----------------------------------------+--------+-----------------------------------------------
search_idx: schema_valid | t | Index schema loaded successfully
search_idx: index_readable | t | Index reader opened successfully
search_idx: checksums_valid | t | All segment checksums validated successfully
search_idx: segment_metadata_valid | t | 3 segments validated successfully
```
## Heap Reference Validation
To verify that all indexed entries still exist in the heap table, use the `heapallindexed` option:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", heapallindexed=True)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true
)
```
This adds an additional check that validates every indexed `ctid` (tuple identifier) references a valid row in the table.
This is particularly useful for detecting index entries that reference deleted or non-existent rows.
The `heapallindexed` option can be slow on large indexes as it must verify
every document. Consider using `sample_rate` for quick spot checks on large
indexes.
## Options
### Sampling for Large Indexes
For large indexes, you can check a random sample of documents instead of all documents:
```sql SQL
-- Check 10% of documents
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
sample_rate := 0.1
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
sample_rate: 0.1
)
```
### Progress Reporting
For long-running verifications, enable progress reporting to see status updates:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true
)
```
Progress messages are emitted via PostgreSQL's `NOTICE` channel.
### Verbose Mode
For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true,
verbose := true
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true,
verbose: true
)
```
### Stop on First Error
To stop verification immediately when the first error is found (similar to `pg_amcheck --on-error-stop`):
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", on_error_stop=True)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
on_error_stop=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
on_error_stop: true
)
```
## Parallel Verification
A single `verify_index` call processes segments sequentially within one PostgreSQL backend.
For very large indexes, you can distribute verification across multiple database connections
by specifying which segments each connection should check using the `segment_ids` parameter.
This allows you to utilize multiple CPU cores by running verification in parallel processes.
### Listing Segments
First, list all segments in the index:
```sql SQL
SELECT * FROM pdb.index_segments('search_idx');
```
```python Django
from paradedb import paradedb_index_segments
paradedb_index_segments("search_idx")
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_index_segments(engine, "search_idx")
```
```ruby Rails
ParadeDB.paradedb_index_segments("search_idx")
```
```
partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
search_idx | 0 | b7e661af | 10000 | 0 | 10000
search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000
search_idx | 2 | 9894b412 | 10000 | 0 | 10000
search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000
```
### Verifying Specific Segments
Then verify specific segments using the `segment_ids` parameter:
```sql SQL
-- Worker 1: Verify even segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[0, 2]
);
-- Worker 2: Verify odd segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[1, 3]
);
```
```python Django
from paradedb import paradedb_verify_index
# Worker 1: Verify even segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
# Worker 1: Verify even segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
```
```ruby Rails
# Worker 1: Verify even segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [0, 2]
)
# Worker 2: Verify odd segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [1, 3]
)
```
### Automation Example
Distribute verification across N workers:
```sql SQL
-- Get segments for worker 0 (of 4 workers)
SELECT array_agg(segment_idx) AS segments
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0;
-- Run verification with those segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := (
SELECT array_agg(segment_idx)
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0
)
);
```
```python Django
from paradedb import paradedb_index_segments, paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in paradedb_index_segments("search_idx")
if row["segment_idx"] % 4 == 0
],
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in diagnostics.paradedb_index_segments(engine, "search_idx")
if row["segment_idx"] % 4 == 0
],
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: ParadeDB.paradedb_index_segments("search_idx").filter_map { |row|
row["segment_idx"] if row["segment_idx"] % 4 == 0
})
```
## Verifying All BM25 Indexes
To verify all BM25 indexes in the database at once:
```sql SQL
SELECT * FROM pdb.verify_all_indexes();
```
```python Django
from paradedb import paradedb_verify_all_indexes
paradedb_verify_all_indexes()
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_all_indexes(engine)
```
```ruby Rails
ParadeDB.paradedb_verify_all_indexes()
```
### Filtering by Pattern
Filter indexes by schema or name pattern (using SQL `LIKE` syntax):
```sql SQL
-- Verify indexes in the 'public' schema only
SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public');
-- Verify indexes matching a name pattern
SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');
-- Combine filters
SELECT * FROM pdb.verify_all_indexes(
schema_pattern := 'app_%',
index_pattern := '%_idx',
heapallindexed := true
);
```
```python Django
from paradedb import paradedb_verify_all_indexes
# Verify indexes in the 'public' schema only
paradedb_verify_all_indexes(schema_pattern="public")
# Verify indexes matching a name pattern
paradedb_verify_all_indexes(index_pattern="search_%")
# Combine filters
paradedb_verify_all_indexes(
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
# Verify indexes in the 'public' schema only
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="public",
)
# Verify indexes matching a name pattern
diagnostics.paradedb_verify_all_indexes(
engine,
index_pattern="search_%",
)
# Combine filters
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
```
```ruby Rails
# Verify indexes in the 'public' schema only
ParadeDB.paradedb_verify_all_indexes(schema_pattern: "public")
# Verify indexes matching a name pattern
ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_%")
# Combine filters
ParadeDB.paradedb_verify_all_indexes(
schema_pattern: "app_%",
index_pattern: "%_idx",
heapallindexed: true
)
```
## Listing All BM25 Indexes
To see all BM25 indexes in the database with summary statistics:
```sql SQL
SELECT * FROM pdb.indexes();
```
```python Django
from paradedb import paradedb_indexes
paradedb_indexes()
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_indexes(engine)
```
```ruby Rails
ParadeDB.paradedb_indexes()
```
```
schemaname | tablename | indexname | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
public | products | products_idx | 16421 | 3 | 50000
public | documents | documents_idx | 16435 | 5 | 125000
app | articles | articles_idx | 16448 | 2 | 10000
```
## Function Reference
### `pdb.verify_index`
Verifies a single BM25 index.
| Parameter | Type | Default | Description |
| ----------------- | -------- | ---------- | ---------------------------------------------------------- |
| `index` | regclass | (required) | The index to verify |
| `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap |
| `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean | `false` | Emit progress messages |
| `verbose` | boolean | `false` | Emit detailed segment-level progress and resume hints |
| `on_error_stop` | boolean | `false` | Stop on first error found |
| `segment_ids` | int[] | `NULL` | Specific segment indices to check. NULL = all segments |
### `pdb.verify_all_indexes`
Verifies all BM25 indexes in the database.
| Parameter | Type | Default | Description |
| ----------------- | ------- | ------- | ---------------------------------------------------------- |
| `schema_pattern` | text | `NULL` | Filter by schema name (SQL LIKE pattern). NULL = all |
| `index_pattern` | text | `NULL` | Filter by index name (SQL LIKE pattern). NULL = all |
| `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap |
| `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean | `false` | Emit progress messages |
| `on_error_stop` | boolean | `false` | Stop on first error found |
### `pdb.index_segments`
Lists all segments in a BM25 index.
| Parameter | Type | Default | Description |
| --------- | -------- | ---------- | -------------------- |
| `index` | regclass | (required) | The index to inspect |
Returns:
| Column | Type | Description |
| ---------------- | ------ | ------------------------------------------------ |
| `partition_name` | text | Name of the index partition |
| `segment_idx` | int | Segment index (use with `segment_ids` parameter) |
| `segment_id` | text | Tantivy segment UUID |
| `num_docs` | bigint | Number of live documents |
| `num_deleted` | bigint | Number of deleted documents |
| `max_doc` | bigint | Maximum document ID |
### `pdb.indexes`
Lists all BM25 indexes in the database.
Returns:
| Column | Type | Description |
| -------------- | ------ | ----------------------------------- |
| `schemaname` | text | Schema containing the index |
| `tablename` | text | Table the index is on |
| `indexname` | text | Name of the index |
| `indexrelid` | oid | OID of the index |
| `num_segments` | int | Number of Tantivy segments |
| `total_docs` | bigint | Total documents across all segments |