--- title: Columnar Storage description: Column-oriented indexing for fast filtering, sorting, and aggregates canonical: https://docs.paradedb.com/documentation/indexing/columnar --- By default, all non-text and non-JSON fields are indexed using ParadeDB's columnar format. This enables fast [filtering pushdown](/documentation/filtering#filter-pushdown), [Top K ordering](/documentation/sorting/topk), and [aggregates](/documentation/aggregates/overview) over these fields. For example, in the following index definition, `rating` and `id` are columnar indexed because they are integers, whereas `description` is not because it is text. ```sql SQL CREATE INDEX search_idx ON mock_items USING bm25 (id, description, rating) WITH (key_field = 'id'); ``` ```ts Drizzle import { indexing } from "@paradedb/drizzle-paradedb"; indexing .bm25Index("search_idx") .on(mockItems.id, mockItems.description, mockItems.rating); ``` ```python Django from django.db import connection from paradedb.indexes import BM25Index with connection.schema_editor() as schema_editor: schema_editor.add_index( MockItem, BM25Index( fields={ "id": {}, "description": {}, "rating": {}, }, key_field="id", name="search_idx", ), ) ``` ```python SQLAlchemy from sqlalchemy import Index from paradedb.sqlalchemy import indexing idx = Index( "search_idx", indexing.BM25Field(MockItem.id), indexing.BM25Field(MockItem.description), indexing.BM25Field(MockItem.rating), postgresql_using="bm25", postgresql_with={"key_field": "id"}, ) with engine.begin() as conn: idx.create(conn) ``` ```ruby Rails ActiveRecord::Base.connection.add_bm25_index( :mock_items, fields: { id: {}, description: {}, rating: {} }, key_field: :id, name: :search_idx ) ``` ```cs EF Core modelBuilder.Entity() .HasBm25Index("search_idx", e => e.Id) .HasField(e => e.Description) .HasField(e => e.Rating); ``` To enable columnar indexing for text and JSON fields, cast the field to a [tokenizer](/documentation/tokenizers/overview) with `columnar` set to `true`. ```sql SQL CREATE INDEX search_idx ON mock_items USING bm25 (id, (description::pdb.unicode_words('columnar=true')), rating) WITH (key_field = 'id'); ``` ```ts Drizzle import { indexing, tokenizer } from "@paradedb/drizzle-paradedb"; indexing .bm25Index("search_idx") .on( mockItems.id, indexing.bm25Field( mockItems.description, tokenizer.unicodeWords({ columnar: true }), ), mockItems.rating, ); ``` ```python Django from django.db import connection from paradedb.indexes import BM25Index from paradedb.search import Tokenizer with connection.schema_editor() as schema_editor: schema_editor.add_index( MockItem, BM25Index( fields={ "id": {}, "description": { "tokenizer": Tokenizer.unicode_words( options={"columnar": True} ), }, "rating": {}, }, key_field="id", name="search_idx", ), ) ``` ```python SQLAlchemy from sqlalchemy import Index from paradedb.sqlalchemy import indexing, tokenizer idx = Index( "search_idx", indexing.BM25Field(MockItem.id), indexing.BM25Field( MockItem.description, tokenizer=tokenizer.unicode_words(options={"columnar": True}), ), indexing.BM25Field(MockItem.rating), postgresql_using="bm25", postgresql_with={"key_field": "id"}, ) with engine.begin() as conn: idx.create(conn) ``` ```ruby Rails ActiveRecord::Base.connection.add_bm25_index( :mock_items, fields: { id: {}, description: { tokenizer: Tokenizer.unicode_words(options: { columnar: true }) }, rating: {} }, key_field: :id, name: :search_idx ) ``` ```cs EF Core modelBuilder.Entity() .HasBm25Index("search_idx", e => e.Id) .HasField(e => e.Description, Tokenizer.Unicode(new() { ["columnar"] = true })) .HasField(e => e.Rating); ``` The `columnar` option for tokenizers is available in versions `0.22.0` and above. Columnar defaults to `false` for all tokenizers besides [literal](/documentation/tokenizers/available-tokenizers/literal) and [literal normalized](/documentation/tokenizers/available-tokenizers/literal-normalized), which default to `true` and do not require an explicit setting. The reason is that tokenized fields can represent large documents and would be expensive to store column-wise, whereas literal and literal normalized fields are typically single-value and much more compact. The columnar field stores the raw text value regardless of the tokenizer. For example, if `Hello world` is split into tokens `hello` and `world`, the columnar value remains `Hello world`. This is important because operations like filtering and sorting require the original field value, not the tokens. Internally, Tantivy refers to columnar fields as fast fields. Our [legacy docs](/legacy/indexing/create-index) also refer to these fields as fast.