# ParadeDB Benchmarks

Benchmarking suite for ParadeDB. Executes a series of common full text and faceted queries over a generated table,
with text, numeric, timestamp and JSON columns.

## Prerequisites

The benchmarking scripts require a Postgres database with [`pg_search`](/pg_search) installed. If you are building `pg_search` with
`cargo pgrx`, make sure to build in `--release` mode.

## Usage

The following command generates a test table, builds a BM25 index, runs benchmarking queries, and outputs the results to a Markdown file.

```bash
cargo run -- --url POSTGRES_URL
```

For more options:

```bash
cargo run -- --help
```

## Datasets

Each benchmark run uses a single dataset located under `datasets/$name`, with data generated by a `datasets/$name/generate.sql` file.

The queries that are benchmarked for a dataset are located at `datasets/$name/queries/$type/*.sql` (where `$type` is usually "pg_search"). Each query file represents a single query: when a single file contains multiple queries, the first query in the file is considered to be the canonical/idiomatic way to write the query, and any additional queries in the file are considered alternative ways to write the query. The canonical query may not always be the fastest (yet!) but we strive to make the canonical query perform as well as a non-idiomatic, slightly contorted query might.