---
title: Columnar Storage
description: Column-oriented indexing for fast filtering, sorting, and aggregates
canonical: https://docs.paradedb.com/documentation/indexing/columnar
---
By default, all non-text and non-JSON fields are indexed using ParadeDB's columnar format.
This enables fast [filtering pushdown](/documentation/filtering#filter-pushdown), [Top K ordering](/documentation/sorting/topk), and
[aggregates](/documentation/aggregates/overview) over these fields. For example, in the following index definition, `rating` and `id` are columnar indexed
because they are integers, whereas `description` is not because it is text.
```sql SQL
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, rating)
WITH (key_field = 'id');
```
```python Django
from django.db import connection
from paradedb.indexes import BM25Index
with connection.schema_editor() as schema_editor:
schema_editor.add_index(
MockItem,
BM25Index(
fields={
"id": {},
"description": {},
"rating": {},
},
key_field="id",
name="search_idx",
),
)
```
```python SQLAlchemy
from sqlalchemy import Index
from paradedb.sqlalchemy import indexing
idx = Index(
"search_idx",
indexing.BM25Field(MockItem.id),
indexing.BM25Field(MockItem.description),
indexing.BM25Field(MockItem.rating),
postgresql_using="bm25",
postgresql_with={"key_field": "id"},
)
with engine.begin() as conn:
idx.create(conn)
```
```ruby Rails
ActiveRecord::Base.connection.add_bm25_index(
:mock_items,
fields: {
id: {},
description: {},
rating: {}
},
key_field: :id,
name: :search_idx
)
```
To enable columnar indexing for text and JSON fields, cast the field to a [tokenizer](/documentation/tokenizers/overview) with `columnar` set to `true`.
```sql SQL
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.unicode_words('columnar=true')), rating)
WITH (key_field = 'id');
```
```python Django
from django.db import connection
from paradedb.indexes import BM25Index
with connection.schema_editor() as schema_editor:
schema_editor.add_index(
MockItem,
BM25Index(
fields={
"id": {},
"description": {
"tokenizer": "unicode_words",
"named_args": {"columnar": True},
},
"rating": {},
},
key_field="id",
name="search_idx",
),
)
```
```python SQLAlchemy
from sqlalchemy import Index
from paradedb.sqlalchemy import indexing
idx = Index(
"search_idx",
indexing.BM25Field(MockItem.id),
indexing.BM25Field(
MockItem.description,
tokenizer=indexing.tokenize.from_config(
{
"tokenizer": "unicode_words",
"named_args": {"columnar": True},
}
),
),
indexing.BM25Field(MockItem.rating),
postgresql_using="bm25",
postgresql_with={"key_field": "id"},
)
with engine.begin() as conn:
idx.create(conn)
```
```ruby Rails
ActiveRecord::Base.connection.add_bm25_index(
:mock_items,
fields: {
id: {},
description: {
tokenizer: :unicode_words,
named_args: { columnar: true }
},
rating: {}
},
key_field: :id,
name: :search_idx
)
```
The `columnar` option for tokenizers is available in versions `0.22.0` and
above.
Columnar defaults to `false` for all tokenizers besides [literal](/documentation/tokenizers/available-tokenizers/literal) and
[literal normalized](/documentation/tokenizers/available-tokenizers/literal-normalized), which default to
`true` and do not require an explicit setting.
The reason is that tokenized fields can represent large documents and would be expensive to store column-wise,
whereas literal and literal normalized fields are typically single-value and much more compact.
The columnar field stores the raw text value regardless of the tokenizer. For example, if `Hello world` is
split into tokens `hello` and `world`, the columnar value remains `Hello world`.
This is important because operations like filtering and sorting require the original field value, not the tokens.
Internally, Tantivy refers to columnar fields as fast fields. Our [legacy
docs](/legacy/indexing/create-index) also refer to these fields as fast.