--- title: Overview --- ## Basic Usage You can change how individual fields are tokenized and stored by passing JSON strings to the `WITH` clause of `CREATE INDEX`. For instance, the following statement configures an ngram tokenizer for the `description` field. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, description) WITH ( key_field = 'id', text_fields = '{ "description": { "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false} } }' ); ``` The key(s) of the JSON string correspond to field names, and the values are the configuration options. If a configuration option or field name is not specified, the default values are used (see [all configuration options](#all-configuration-options)). ## Configure Multiple Fields To configure multiple fields, simply pass more keys to the JSON string. For instance, the following statement specifies tokenizers for both the `description` and `category` fields. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, description, category) WITH ( key_field = 'id', text_fields = '{ "description": { "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false} }, "category": { "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false} } }' ); ``` ## All Configuration Options ### Text Fields Options for columns of type `VARCHAR`, `TEXT`, `UUID`, and their corresponding array types should be passed to `text_fields`. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, description) WITH ( key_field = 'id', text_fields = '{ "description": { "fast": true, "tokenizer": {"type": "ngram", "min_gram": 2, "max_gram": 3, "prefix_only": false} } }' ); ``` The nested configuration JSON for `text_fields` accepts the following keys. See [fast fields](/documentation/indexing/fast_fields) for when this option should be set to `true`. See [tokenizers](/documentation/indexing/tokenizers) for how to configure the tokenizer. See [record](/documentation/indexing/record) for a list of available record types. See [normalizers](/documentation/indexing/fast_fields#normalizers) for how to configure the normalizer. Whether the field is indexed. Must be `true` in order for the field to be tokenized and searchable. Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. Fieldnorms store information about the length of the text field. Must be `true` to calculate the BM25 score. ### JSON Fields Options for columns of type `JSON` and `JSONB` should be passed to `json_fields`. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, metadata) WITH ( key_field = 'id', json_fields = '{ "metadata": { "fast": true } }' ); ``` The nested configuration JSON for `json_fields` accepts the following keys. See [fast fields](/documentation/indexing/fast_fields) for when this option should be set to `true`. See [tokenizers](/documentation/indexing/tokenizers) for how to configure the tokenizer. See [record](/documentation/indexing/record) for a list of available record types. See [normalizers](/documentation/indexing/fast_fields#normalizers) for how to configure the normalizer. If `true`, JSON keys containing a `.` will be expanded. For instance, if `expand_dots` is `true`, `{"metadata.color": "red"}` will be indexed as if it was `{"metadata": {"color": "red"}}`. Whether the field is indexed. Must be `true` in order for the field to be tokenized and searchable. Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. Fieldnorms store information about the length of the text field. Must be `true` to calculate the BM25 score. ## Advanced Options In addition to text and JSON, ParadeDB exposes options for numeric, datetime, boolean, range, and enum fields. For most use cases, it is not necessary to change these options. ### Numeric Fields Options for columns of type `SMALLINT`, `INTEGER`, `BIGINT`, `OID`, `REAL`, `DOUBLE PRECISION`, `NUMERIC`, and their corresponding array types should be passed to `numeric_fields`. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, rating) WITH ( key_field = 'id', numeric_fields = '{ "rating": {"fast": true} }' ); ``` Whether the field is indexed. Must be `true` in order for the field to be tokenized and searchable. Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. Fast fields can be random-accessed rapidly. Fields used for aggregation must have `fast` set to `true`. Fast fields are also useful for accelerated scoring and filtering. ### Boolean Fields Options for columns of type `BOOLEAN` and `BOOLEAN[]` should be passed to `boolean_fields`. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, in_stock) WITH ( key_field = 'id', boolean_fields = '{ "in_stock": {"fast": true} }' ); ``` `CREATE_INDEX` accepts several configuration options for `boolean_fields`: Whether the field is indexed. Must be `true` in order for the field to be tokenized and searchable. Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. Fast fields can be random-accessed rapidly. Fields used for aggregation must have `fast` set to `true`. Fast fields are also useful for accelerated scoring and filtering. ### Datetime Fields Options for columns of type `DATE`, `TIMESTAMP`, `TIMESTAMPTZ`, `TIME`, `TIMETZ`, and their corresponding array types should be passed to `datetime_fields`. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, created_at) WITH ( key_field = 'id', datetime_fields = '{ "created_at": {"fast": true} }' ); ``` `CREATE INDEX` accepts several configuration options for `datetime_fields`: Whether the field is indexed. Must be `true` in order for the field to be tokenized and searchable. Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. Fast fields can be random-accessed rapidly. Fields used for aggregation must have `fast` set to `true`. Fast fields are also useful for accelerated scoring and filtering. ### Range Fields Options for columns of type `int4range`, `int8range`, `numrange`, `tsrange`, and `tstzrange` should be passed to `range_fields`. The [range term](/documentation/advanced/term/range_term) query is used to filter over these fields. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, weight_range) WITH ( key_field = 'id', range_fields = '{ "weight_range": {"stored": true} }' ); ``` `CREATE INDEX` accepts several configuration options for `range_fields`: Whether the original value of the field is stored. Required only for use with the [More Like This](/documentation/advanced/specialized/more_like_this) query. ### Enumerated Types Options for custom Postgres [enums](https://www.postgresql.org/docs/current/datatype-enum.html) should be passed to `numeric_fields`. Enums should be queried with [term queries](/documentation/advanced/term/term). If the ordering of the enum is changed with `ADD VALUE ... [ BEFORE | AFTER ]`, the BM25 index should be dropped and recreated to account for the new enum ordinal values. ## Deprecated Syntax Migration In ParadeDB `v0.13.0`, the old `paradedb.create_bm25` function was deprecated in favor of the `CREATE INDEX` syntax. To make migration to the new `CREATE INDEX` syntax easier, a new `paradedb.format_create_bm25` function has been introduced. This function accepts the same arguments as the deprecated `paradedb.create_bm25` function and outputs an equivalent `CREATE INDEX` statement which can be copy, pasted, and executed. ```sql SELECT * FROM paradedb.format_create_bm25( index_name => 'search_idx', table_name => 'mock_items', key_field => 'id', text_fields => paradedb.field('description') || paradedb.field('category'), numeric_fields => paradedb.field('rating'), boolean_fields => paradedb.field('in_stock'), datetime_fields => paradedb.field('created_at'), json_fields => paradedb.field('metadata'), range_fields => paradedb.field('weight_range') ); ``` `paradedb.format_create_bm25` does **not** create the index. It simply outputs a `CREATE INDEX` statement for you to run.