--- title: Columnar Storage description: Column-oriented indexing for fast filtering, sorting, and aggregates canonical: https://docs.paradedb.com/documentation/indexing/columnar --- By default, all non-text and non-JSON fields are indexed using ParadeDB's columnar format. This enables fast [filtering pushdown](/documentation/filtering#filter-pushdown), [Top K ordering](/documentation/sorting/topk), and [aggregates](/documentation/aggregates/overview) over these fields. For example, in the following index definition, `rating` and `id` are columnar indexed because they are integers, whereas `description` is not because it is text. ```sql SQL CREATE INDEX search_idx ON mock_items USING bm25 (id, description, rating) WITH (key_field = 'id'); ``` ```python Django from django.db import connection from paradedb.indexes import BM25Index with connection.schema_editor() as schema_editor: schema_editor.add_index( MockItem, BM25Index( fields={ "id": {}, "description": {}, "rating": {}, }, key_field="id", name="search_idx", ), ) ``` ```python SQLAlchemy from sqlalchemy import Index from paradedb.sqlalchemy import indexing idx = Index( "search_idx", indexing.BM25Field(MockItem.id), indexing.BM25Field(MockItem.description), indexing.BM25Field(MockItem.rating), postgresql_using="bm25", postgresql_with={"key_field": "id"}, ) with engine.begin() as conn: idx.create(conn) ``` ```ruby Rails ActiveRecord::Base.connection.add_bm25_index( :mock_items, fields: { id: {}, description: {}, rating: {} }, key_field: :id, name: :search_idx ) ``` To enable columnar indexing for text and JSON fields, cast the field to a [tokenizer](/documentation/tokenizers/overview) with `columnar` set to `true`. ```sql SQL CREATE INDEX search_idx ON mock_items USING bm25 (id, (description::pdb.unicode_words('columnar=true')), rating) WITH (key_field = 'id'); ``` ```python Django from django.db import connection from paradedb.indexes import BM25Index with connection.schema_editor() as schema_editor: schema_editor.add_index( MockItem, BM25Index( fields={ "id": {}, "description": { "tokenizer": "unicode_words", "named_args": {"columnar": True}, }, "rating": {}, }, key_field="id", name="search_idx", ), ) ``` ```python SQLAlchemy from sqlalchemy import Index from paradedb.sqlalchemy import indexing idx = Index( "search_idx", indexing.BM25Field(MockItem.id), indexing.BM25Field( MockItem.description, tokenizer=indexing.tokenize.from_config( { "tokenizer": "unicode_words", "named_args": {"columnar": True}, } ), ), indexing.BM25Field(MockItem.rating), postgresql_using="bm25", postgresql_with={"key_field": "id"}, ) with engine.begin() as conn: idx.create(conn) ``` ```ruby Rails ActiveRecord::Base.connection.add_bm25_index( :mock_items, fields: { id: {}, description: { tokenizer: :unicode_words, named_args: { columnar: true } }, rating: {} }, key_field: :id, name: :search_idx ) ``` The `columnar` option for tokenizers is available in versions `0.22.0` and above. Columnar defaults to `false` for all tokenizers besides [literal](/documentation/tokenizers/available-tokenizers/literal) and [literal normalized](/documentation/tokenizers/available-tokenizers/literal-normalized), which default to `true` and do not require an explicit setting. The reason is that tokenized fields can represent large documents and would be expensive to store column-wise, whereas literal and literal normalized fields are typically single-value and much more compact. The columnar field stores the raw text value regardless of the tokenizer. For example, if `Hello world` is split into tokens `hello` and `world`, the columnar value remains `Hello world`. This is important because operations like filtering and sorting require the original field value, not the tokens. Internally, Tantivy refers to columnar fields as fast fields. Our [legacy docs](/legacy/indexing/create-index) also refer to these fields as fast.