---
title: Verify Index Integrity
description: Check BM25 indexes for corruption and structural issues
canonical: https://docs.paradedb.com/documentation/indexing/verify-index
---

ParadeDB provides `amcheck`-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes.
These functions are useful for:

- Proactive corruption detection before issues become critical
- Validating index health after hardware failures or unexpected shutdowns
- Verifying backup integrity
- Debugging index-related issues

## Basic Verification

The `pdb.verify_index` function performs structural integrity checks on a BM25 index:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_index('search_idx');
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index("search_idx")
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(engine, "search_idx")
```

```ruby Rails
ParadeDB.paradedb_verify_index("search_idx")
```

</CodeGroup>

This returns a table with three columns:

| Column       | Type    | Description                                   |
| ------------ | ------- | --------------------------------------------- |
| `check_name` | text    | Name of the verification check                |
| `passed`     | boolean | Whether the check passed                      |
| `details`    | text    | Additional information about the check result |

### Example Output

```
               check_name               | passed |                    details
----------------------------------------+--------+-----------------------------------------------
 search_idx: schema_valid               | t      | Index schema loaded successfully
 search_idx: index_readable             | t      | Index reader opened successfully
 search_idx: checksums_valid            | t      | All segment checksums validated successfully
 search_idx: segment_metadata_valid     | t      | 3 segments validated successfully
```

## Heap Reference Validation

To verify that all indexed entries still exist in the heap table, use the `heapallindexed` option:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true);
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index("search_idx", heapallindexed=True)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true
)
```

</CodeGroup>

This adds an additional check that validates every indexed `ctid` (tuple identifier) references a valid row in the table.
This is particularly useful for detecting index entries that reference deleted or non-existent rows.

<Warning>
  The `heapallindexed` option can be slow on large indexes as it must verify
  every document. Consider using `sample_rate` for quick spot checks on large
  indexes.
</Warning>

## Options

### Sampling for Large Indexes

For large indexes, you can check a random sample of documents instead of all documents:

<CodeGroup>
```sql SQL
-- Check 10% of documents
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    sample_rate := 0.1
);
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    sample_rate=0.1,
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    sample_rate=0.1,
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  sample_rate: 0.1
)
```

</CodeGroup>

### Progress Reporting

For long-running verifications, enable progress reporting to see status updates:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    report_progress := true
);
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    report_progress=True,
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    report_progress=True,
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  report_progress: true
)
```

</CodeGroup>

Progress messages are emitted via PostgreSQL's `NOTICE` channel.

### Verbose Mode

For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    report_progress := true,
    verbose := true
);
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    report_progress=True,
    verbose=True,
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    report_progress=True,
    verbose=True,
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  report_progress: true,
  verbose: true
)
```

</CodeGroup>

### Stop on First Error

To stop verification immediately when the first error is found (similar to `pg_amcheck --on-error-stop`):

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_index('search_idx', on_error_stop := true);
```

```python Django
from paradedb import paradedb_verify_index

paradedb_verify_index("search_idx", on_error_stop=True)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    on_error_stop=True,
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  on_error_stop: true
)
```

</CodeGroup>

## Parallel Verification

A single `verify_index` call processes segments sequentially within one PostgreSQL backend.
For very large indexes, you can distribute verification across multiple database connections
by specifying which segments each connection should check using the `segment_ids` parameter.
This allows you to utilize multiple CPU cores by running verification in parallel processes.

### Listing Segments

First, list all segments in the index:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.index_segments('search_idx');
```

```python Django
from paradedb import paradedb_index_segments

paradedb_index_segments("search_idx")
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_index_segments(engine, "search_idx")
```

```ruby Rails
ParadeDB.paradedb_index_segments("search_idx")
```

</CodeGroup>

```
 partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
 search_idx     |           0 | b7e661af   |    10000 |           0 |   10000
 search_idx     |           1 | b4fc1b40   |    10000 |           0 |   10000
 search_idx     |           2 | 9894b412   |    10000 |           0 |   10000
 search_idx     |           3 | 4d0168d6   |     5000 |           0 |    5000
```

### Verifying Specific Segments

Then verify specific segments using the `segment_ids` parameter:

<CodeGroup>
```sql SQL
-- Worker 1: Verify even segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := ARRAY[0, 2]
);

-- Worker 2: Verify odd segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := ARRAY[1, 3]
);

```

```python Django
from paradedb import paradedb_verify_index

# Worker 1: Verify even segments
paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    segment_ids=[0, 2],
)

# Worker 2: Verify odd segments
paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    segment_ids=[1, 3],
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

# Worker 1: Verify even segments
diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    segment_ids=[0, 2],
)

# Worker 2: Verify odd segments
diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    segment_ids=[1, 3],
)
```

```ruby Rails
# Worker 1: Verify even segments
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  segment_ids: [0, 2]
)

# Worker 2: Verify odd segments
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  segment_ids: [1, 3]
)
```

</CodeGroup>

### Automation Example

Distribute verification across N workers:

<CodeGroup>
```sql SQL
-- Get segments for worker 0 (of 4 workers)
SELECT array_agg(segment_idx) AS segments
FROM pdb.index_segments('search_idx')
WHERE segment_idx % 4 = 0;

-- Run verification with those segments
SELECT * FROM pdb.verify_index('search_idx',
    heapallindexed := true,
    segment_ids := (
        SELECT array_agg(segment_idx)
        FROM pdb.index_segments('search_idx')
        WHERE segment_idx % 4 = 0
    )
);

```

```python Django
from paradedb import paradedb_index_segments, paradedb_verify_index

paradedb_verify_index(
    "search_idx",
    heapallindexed=True,
    segment_ids=[
        row["segment_idx"]
        for row in paradedb_index_segments("search_idx")
        if row["segment_idx"] % 4 == 0
    ],
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_index(
    engine,
    "search_idx",
    heapallindexed=True,
    segment_ids=[
        row["segment_idx"]
        for row in diagnostics.paradedb_index_segments(engine, "search_idx")
        if row["segment_idx"] % 4 == 0
    ],
)
```

```ruby Rails
ParadeDB.paradedb_verify_index(
  "search_idx",
  heapallindexed: true,
  segment_ids: ParadeDB.paradedb_index_segments("search_idx").filter_map { |row|
    row["segment_idx"] if row["segment_idx"] % 4 == 0
  })
```

</CodeGroup>

## Verifying All BM25 Indexes

To verify all BM25 indexes in the database at once:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.verify_all_indexes();
```

```python Django
from paradedb import paradedb_verify_all_indexes

paradedb_verify_all_indexes()
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_verify_all_indexes(engine)
```

```ruby Rails
ParadeDB.paradedb_verify_all_indexes()
```
</CodeGroup>

### Filtering by Pattern

Filter indexes by schema or name pattern (using SQL `LIKE` syntax):

<CodeGroup>
```sql SQL
-- Verify indexes in the 'public' schema only
SELECT * FROM pdb.verify_all_indexes(schema_pattern := 'public');

-- Verify indexes matching a name pattern
SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');

-- Combine filters
SELECT * FROM pdb.verify_all_indexes(
    schema_pattern := 'app_%',
    index_pattern := '%_idx',
    heapallindexed := true
);
```

```python Django
from paradedb import paradedb_verify_all_indexes

# Verify indexes in the 'public' schema only
paradedb_verify_all_indexes(schema_pattern="public")

# Verify indexes matching a name pattern
paradedb_verify_all_indexes(index_pattern="search_%")

# Combine filters
paradedb_verify_all_indexes(
    schema_pattern="app_%",
    index_pattern="%_idx",
    heapallindexed=True,
)
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

# Verify indexes in the 'public' schema only
diagnostics.paradedb_verify_all_indexes(
    engine,
    schema_pattern="public",
)

# Verify indexes matching a name pattern
diagnostics.paradedb_verify_all_indexes(
    engine,
    index_pattern="search_%",
)

# Combine filters
diagnostics.paradedb_verify_all_indexes(
    engine,
    schema_pattern="app_%",
    index_pattern="%_idx",
    heapallindexed=True,
)
```

```ruby Rails
# Verify indexes in the 'public' schema only
ParadeDB.paradedb_verify_all_indexes(schema_pattern: "public")

# Verify indexes matching a name pattern
ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_%")

# Combine filters
ParadeDB.paradedb_verify_all_indexes(
  schema_pattern: "app_%",
  index_pattern: "%_idx",
  heapallindexed: true
)
```

</CodeGroup>

## Listing All BM25 Indexes

To see all BM25 indexes in the database with summary statistics:

<CodeGroup>
```sql SQL
SELECT * FROM pdb.indexes();
```

```python Django
from paradedb import paradedb_indexes

paradedb_indexes()
```

```python SQLAlchemy
from paradedb.sqlalchemy import diagnostics

diagnostics.paradedb_indexes(engine)
```

```ruby Rails
ParadeDB.paradedb_indexes()
```

</CodeGroup>

```
 schemaname |  tablename  |   indexname   | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
 public     | products    | products_idx  |      16421 |            3 |      50000
 public     | documents   | documents_idx |      16435 |            5 |     125000
 app        | articles    | articles_idx  |      16448 |            2 |      10000
```

## Function Reference

### `pdb.verify_index`

Verifies a single BM25 index.

| Parameter         | Type     | Default    | Description                                                |
| ----------------- | -------- | ---------- | ---------------------------------------------------------- |
| `index`           | regclass | (required) | The index to verify                                        |
| `heapallindexed`  | boolean  | `false`    | Check that all indexed ctids exist in the heap             |
| `sample_rate`     | float    | `NULL`     | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean  | `false`    | Emit progress messages                                     |
| `verbose`         | boolean  | `false`    | Emit detailed segment-level progress and resume hints      |
| `on_error_stop`   | boolean  | `false`    | Stop on first error found                                  |
| `segment_ids`     | int[]    | `NULL`     | Specific segment indices to check. NULL = all segments     |

### `pdb.verify_all_indexes`

Verifies all BM25 indexes in the database.

| Parameter         | Type    | Default | Description                                                |
| ----------------- | ------- | ------- | ---------------------------------------------------------- |
| `schema_pattern`  | text    | `NULL`  | Filter by schema name (SQL LIKE pattern). NULL = all       |
| `index_pattern`   | text    | `NULL`  | Filter by index name (SQL LIKE pattern). NULL = all        |
| `heapallindexed`  | boolean | `false` | Check that all indexed ctids exist in the heap             |
| `sample_rate`     | float   | `NULL`  | Fraction of documents to check (0.0-1.0). NULL = check all |
| `report_progress` | boolean | `false` | Emit progress messages                                     |
| `on_error_stop`   | boolean | `false` | Stop on first error found                                  |

### `pdb.index_segments`

Lists all segments in a BM25 index.

| Parameter | Type     | Default    | Description          |
| --------- | -------- | ---------- | -------------------- |
| `index`   | regclass | (required) | The index to inspect |

Returns:

| Column           | Type   | Description                                      |
| ---------------- | ------ | ------------------------------------------------ |
| `partition_name` | text   | Name of the index partition                      |
| `segment_idx`    | int    | Segment index (use with `segment_ids` parameter) |
| `segment_id`     | text   | Tantivy segment UUID                             |
| `num_docs`       | bigint | Number of live documents                         |
| `num_deleted`    | bigint | Number of deleted documents                      |
| `max_doc`        | bigint | Maximum document ID                              |

### `pdb.indexes`

Lists all BM25 indexes in the database.

Returns:

| Column         | Type   | Description                         |
| -------------- | ------ | ----------------------------------- |
| `schemaname`   | text   | Schema containing the index         |
| `tablename`    | text   | Table the index is on               |
| `indexname`    | text   | Name of the index                   |
| `indexrelid`   | oid    | OID of the index                    |
| `num_segments` | int    | Number of Tantivy segments          |
| `total_docs`   | bigint | Total documents across all segments |