---
title: Verify Index Integrity
description: Check BM25 indexes for corruption and structural issues
canonical: https://docs.paradedb.com/documentation/indexing/verify-index
---
ParadeDB provides `amcheck`-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes.
These functions are useful for:
- Proactive corruption detection before issues become critical
- Validating index health after hardware failures or unexpected shutdowns
- Verifying backup integrity
- Debugging index-related issues
## Basic Verification
The `pdb.verify_index` function performs structural integrity checks on a BM25 index:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx');
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(diagnostics.verifyIndex("search_idx"));
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx")
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(engine, "search_idx")
```
```ruby Rails
ParadeDB.paradedb_verify_index("search_idx")
```
```cs EF Core
await dbContext.Database.VerifyIndex("search_idx").ToListAsync();
```
This returns a table with three columns:
| Column | Type | Description |
| ------------ | ------- | --------------------------------------------- |
| `check_name` | text | Name of the verification check |
| `passed` | boolean | Whether the check passed |
| `details` | text | Additional information about the check result |
### Example Output
```
check_name | passed | details
----------------------------------------+--------+-----------------------------------------------
search_idx: schema_valid | t | Index schema loaded successfully
search_idx: index_readable | t | Index reader opened successfully
search_idx: checksums_valid | t | All segment checksums validated successfully
search_idx: segment_metadata_valid | t | 3 segments validated successfully
```
## Heap Reference Validation
To verify that all indexed entries still exist in the heap table, use the `heapallindexed` option:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
}),
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", heapallindexed=True)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true
)
```
```cs EF Core
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions { HeapAllIndexed = true }
).ToListAsync();
```
This adds an additional check that validates every indexed `ctid` (tuple identifier) references a valid row in the table.
This is particularly useful for detecting index entries that reference deleted or non-existent rows.
The `heapallindexed` option can be slow on large indexes as it must verify
every document. Consider using `sample_rate` for quick spot checks on large
indexes.
## Options
### Sampling for Large Indexes
For large indexes, you can check a random sample of documents instead of all documents:
```sql SQL
-- Check 10% of documents
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
sample_rate := 0.1
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
sampleRate: 0.1,
}),
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
sample_rate: 0.1
)
```
```cs EF Core
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
SampleRate = 0.1,
}
).ToListAsync();
```
### Progress Reporting
For long-running verifications, enable progress reporting to see status updates:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
reportProgress: true,
}),
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true
)
```
```cs EF Core
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
ReportProgress = true,
}
).ToListAsync();
```
Progress messages are emitted via PostgreSQL's `NOTICE` channel.
### Verbose Mode
For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
report_progress := true,
verbose := true
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
reportProgress: true,
verbose: true,
}),
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true,
verbose: true
)
```
```cs EF Core
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
ReportProgress = true,
Verbose = true,
}
).ToListAsync();
```
### Stop on First Error
To stop verification immediately when the first error is found (similar to `pg_amcheck --on-error-stop`):
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(
diagnostics.verifyIndex("search_idx", {
onErrorStop: true,
}),
);
```
```python Django
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", on_error_stop=True)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
on_error_stop=True,
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
on_error_stop: true
)
```
```cs EF Core
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions { OnErrorStop = true }
).ToListAsync();
```
## Parallel Verification
A single `verify_index` call processes segments sequentially within one PostgreSQL backend.
For very large indexes, you can distribute verification across multiple database connections
by specifying which segments each connection should check using the `segment_ids` parameter.
This allows you to utilize multiple CPU cores by running verification in parallel processes.
### Listing Segments
First, list all segments in the index:
```sql SQL
SELECT * FROM pdb.index_segments('search_idx');
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(diagnostics.indexSegments("search_idx"));
```
```python Django
from paradedb import paradedb_index_segments
paradedb_index_segments("search_idx")
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_index_segments(engine, "search_idx")
```
```ruby Rails
ParadeDB.paradedb_index_segments("search_idx")
```
```cs EF Core
await dbContext.Database.IndexSegments("search_idx").ToListAsync();
```
```
partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
search_idx | 0 | b7e661af | 10000 | 0 | 10000
search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000
search_idx | 2 | 9894b412 | 10000 | 0 | 10000
search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000
```
### Verifying Specific Segments
Then verify specific segments using the `segment_ids` parameter:
```sql SQL
-- Worker 1: Verify even segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[0, 2]
);
-- Worker 2: Verify odd segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := ARRAY[1, 3]
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
const segments = await db.execute(diagnostics.indexSegments("search_idx"));
const evenSegments = segments
.map((row) => row.segment_idx)
.filter((segmentIdx) => segmentIdx % 2 === 0);
const oddSegments = segments
.map((row) => row.segment_idx)
.filter((segmentIdx) => segmentIdx % 2 === 1);
// Worker 1: Verify even segments
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
segmentIds: evenSegments,
}),
);
// Worker 2: Verify odd segments
if (oddSegments.length) {
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
segmentIds: oddSegments,
}),
);
}
```
```python Django
from paradedb import paradedb_verify_index
# Worker 1: Verify even segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
# Worker 1: Verify even segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
```
```ruby Rails
# Worker 1: Verify even segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [0, 2]
)
# Worker 2: Verify odd segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [1, 3]
)
```
```cs EF Core
var segments = await dbContext.Database.IndexSegments("search_idx").ToListAsync();
var evenSegments = segments
.Select(segment => segment.SegmentIdx)
.Where(segmentIdx => segmentIdx % 2 == 0)
.ToArray();
var oddSegments = segments
.Select(segment => segment.SegmentIdx)
.Where(segmentIdx => segmentIdx % 2 == 1)
.ToArray();
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
SegmentIds = evenSegments,
}
).ToListAsync();
if (oddSegments.Length > 0)
{
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
SegmentIds = oddSegments,
}
).ToListAsync();
}
```
### Automation Example
Distribute verification across N workers:
```sql SQL
-- Get segments for worker 0 (of 4 workers)
SELECT array_agg(segment_idx) AS segments
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0;
-- Run verification with those segments
SELECT * FROM pdb.verify_index('search_idx',
heapallindexed := true,
segment_ids := (
SELECT array_agg(segment_idx)
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0
)
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
const segments = await db.execute(diagnostics.indexSegments("search_idx"));
await db.execute(
diagnostics.verifyIndex("search_idx", {
heapAllIndexed: true,
segmentIds: segments
.map((row) => row.segment_idx)
.filter((segmentIdx) => segmentIdx % 4 === 0),
}),
);
```
```python Django
from paradedb import paradedb_index_segments, paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in paradedb_index_segments("search_idx")
if row["segment_idx"] % 4 == 0
],
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in diagnostics.paradedb_index_segments(engine, "search_idx")
if row["segment_idx"] % 4 == 0
],
)
```
```ruby Rails
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: ParadeDB.paradedb_index_segments("search_idx").filter_map { |row|
row["segment_idx"] if row["segment_idx"] % 4 == 0
})
```
```cs EF Core
var segments = await dbContext.Database.IndexSegments("search_idx").ToListAsync();
await dbContext.Database.VerifyIndex(
"search_idx",
new VerifyIndexOptions
{
HeapAllIndexed = true,
SegmentIds = segments
.Select(segment => segment.SegmentIdx)
.Where(segmentIdx => segmentIdx % 4 == 0)
.ToArray(),
}
).ToListAsync();
```
## Verifying All BM25 Indexes
To verify all BM25 indexes in the database at once:
```sql SQL
SELECT * FROM pdb.verify_all_indexes();
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(diagnostics.verifyAllIndexes());
```
```python Django
from paradedb import paradedb_verify_all_indexes
paradedb_verify_all_indexes()
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_all_indexes(engine)
```
```ruby Rails
ParadeDB.paradedb_verify_all_indexes()
```
```cs EF Core
await dbContext.Database.VerifyAllIndexes().ToListAsync();
```
### Filtering by Pattern
Filter indexes by schema or name pattern (using SQL `LIKE` syntax):
```sql SQL
-- Verify indexes in the 'public' schema only
SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public');
-- Verify indexes matching a name pattern
SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');
-- Combine filters
SELECT * FROM pdb.verify_all_indexes(
schema_pattern := 'app_%',
index_pattern := '%_idx',
heapallindexed := true
);
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
// Verify indexes in the 'public' schema only
await db.execute(
diagnostics.verifyAllIndexes({
schemaPattern: "public",
}),
);
// Verify indexes matching a name pattern
await db.execute(
diagnostics.verifyAllIndexes({
indexPattern: "search_%",
}),
);
// Combine filters
await db.execute(
diagnostics.verifyAllIndexes({
schemaPattern: "public",
indexPattern: "search_%",
heapAllIndexed: true,
}),
);
```
```python Django
from paradedb import paradedb_verify_all_indexes
# Verify indexes in the 'public' schema only
paradedb_verify_all_indexes(schema_pattern="public")
# Verify indexes matching a name pattern
paradedb_verify_all_indexes(index_pattern="search_%")
# Combine filters
paradedb_verify_all_indexes(
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
# Verify indexes in the 'public' schema only
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="public",
)
# Verify indexes matching a name pattern
diagnostics.paradedb_verify_all_indexes(
engine,
index_pattern="search_%",
)
# Combine filters
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
```
```ruby Rails
# Verify indexes in the 'public' schema only
ParadeDB.paradedb_verify_all_indexes(schema_pattern: "public")
# Verify indexes matching a name pattern
ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_%")
# Combine filters
ParadeDB.paradedb_verify_all_indexes(
schema_pattern: "app_%",
index_pattern: "%_idx",
heapallindexed: true
)
```
```cs EF Core
// Verify indexes in the 'public' schema only
await dbContext.Database.VerifyAllIndexes(
new VerifyAllIndexesOptions { SchemaPattern = "public" }
).ToListAsync();
// Verify indexes matching a name pattern
await dbContext.Database.VerifyAllIndexes(
new VerifyAllIndexesOptions { IndexPattern = "search_%" }
).ToListAsync();
// Combine filters
await dbContext.Database.VerifyAllIndexes(
new VerifyAllIndexesOptions
{
SchemaPattern = "public",
IndexPattern = "search_%",
HeapAllIndexed = true,
}
).ToListAsync();
```
## Listing All BM25 Indexes
To see all BM25 indexes in the database with summary statistics:
```sql SQL
SELECT * FROM pdb.indexes();
```
```ts Drizzle
import { diagnostics } from "@paradedb/drizzle-paradedb";
await db.execute(diagnostics.indexes());
```
```python Django
from paradedb import paradedb_indexes
paradedb_indexes()
```
```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_indexes(engine)
```
```ruby Rails
ParadeDB.paradedb_indexes()
```
```cs EF Core
await dbContext.Database
.Indexes()
.Select(index => new
{
index.SchemaName,
index.TableName,
index.IndexName,
index.NumSegments,
index.TotalDocs,
})
.ToListAsync();
```
```
schemaname | tablename | indexname | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
public | products | products_idx | 16421 | 3 | 50000
public | documents | documents_idx | 16435 | 5 | 125000
app | articles | articles_idx | 16448 | 2 | 10000
```
## Function Reference
### `pdb.verify_index`
Verifies a single BM25 index.
| Parameter | Type | Default | Description |
| ----------------- | -------- | ---------- | ---------------------------------------------------------- |
| `index` | regclass | (required) | The index to verify |
| `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap |
| `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean | `false` | Emit progress messages |
| `verbose` | boolean | `false` | Emit detailed segment-level progress and resume hints |
| `on_error_stop` | boolean | `false` | Stop on first error found |
| `segment_ids` | int[] | `NULL` | Specific segment indices to check. NULL = all segments |
### `pdb.verify_all_indexes`
Verifies all BM25 indexes in the database.
| Parameter | Type | Default | Description |
| ----------------- | ------- | ------- | ---------------------------------------------------------- |
| `schema_pattern` | text | `NULL` | Filter by schema name (SQL LIKE pattern). NULL = all |
| `index_pattern` | text | `NULL` | Filter by index name (SQL LIKE pattern). NULL = all |
| `heapallindexed` | boolean | `false` | Check that all indexed ctids exist in the heap |
| `sample_rate` | float | `NULL` | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean | `false` | Emit progress messages |
| `on_error_stop` | boolean | `false` | Stop on first error found |
### `pdb.index_segments`
Lists all segments in a BM25 index.
| Parameter | Type | Default | Description |
| --------- | -------- | ---------- | -------------------- |
| `index` | regclass | (required) | The index to inspect |
Returns:
| Column | Type | Description |
| ---------------- | ------ | ------------------------------------------------ |
| `partition_name` | text | Name of the index partition |
| `segment_idx` | int | Segment index (use with `segment_ids` parameter) |
| `segment_id` | text | Tantivy segment UUID |
| `num_docs` | bigint | Number of live documents |
| `num_deleted` | bigint | Number of deleted documents |
| `max_doc` | bigint | Maximum document ID |
### `pdb.indexes`
Lists all BM25 indexes in the database.
Returns:
| Column | Type | Description |
| -------------- | ------ | ----------------------------------- |
| `schemaname` | text | Schema containing the index |
| `tablename` | text | Table the index is on |
| `indexname` | text | Name of the index |
| `indexrelid` | oid | OID of the index |
| `num_segments` | int | Number of Tantivy segments |
| `total_docs` | bigint | Total documents across all segments |