---
title: Columnar Storage
description: Column-oriented indexing for fast filtering, sorting, and aggregates
canonical: https://docs.paradedb.com/documentation/indexing/columnar
---

By default, all non-text and non-JSON fields are indexed using ParadeDB's columnar format.
This enables fast [filtering pushdown](/documentation/filtering#filter-pushdown), [Top K ordering](/documentation/sorting/topk), and
[aggregates](/documentation/aggregates/overview) over these fields. For example, in the following index definition, `rating` and `id` are columnar indexed
because they are integers, whereas `description` is not because it is text.

<CodeGroup>
```sql SQL
CREATE INDEX search_idx ON mock_items
USING bm25 (id, description, rating)
WITH (key_field = 'id');
```

```ts Drizzle
import { indexing } from "@paradedb/drizzle-paradedb";

indexing
  .bm25Index("search_idx")
  .on(mockItems.id, mockItems.description, mockItems.rating);
```

```python Django
from django.db import connection
from paradedb.indexes import BM25Index

with connection.schema_editor() as schema_editor:
    schema_editor.add_index(
        MockItem,
        BM25Index(
            fields={
                "id": {},
                "description": {},
                "rating": {},
            },
            key_field="id",
            name="search_idx",
        ),
    )
```

```python SQLAlchemy
from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

idx = Index(
    "search_idx",
    indexing.BM25Field(MockItem.id),
    indexing.BM25Field(MockItem.description),
    indexing.BM25Field(MockItem.rating),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

with engine.begin() as conn:
    idx.create(conn)
```

```ruby Rails
ActiveRecord::Base.connection.add_bm25_index(
  :mock_items,
  fields: {
    id: {},
    description: {},
    rating: {}
  },
  key_field: :id,
  name: :search_idx
)
```

```cs EF Core
modelBuilder.Entity<MockItem>()
    .HasBm25Index("search_idx", e => e.Id)
    .HasField(e => e.Description)
    .HasField(e => e.Rating);
```

</CodeGroup>

To enable columnar indexing for text and JSON fields, cast the field to a [tokenizer](/documentation/tokenizers/overview) with `columnar` set to `true`.

<CodeGroup>
```sql SQL
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.unicode_words('columnar=true')), rating)
WITH (key_field = 'id');
```

```ts Drizzle
import { indexing, tokenizer } from "@paradedb/drizzle-paradedb";

indexing
  .bm25Index("search_idx")
  .on(
    mockItems.id,
    indexing.bm25Field(
      mockItems.description,
      tokenizer.unicodeWords({ columnar: true }),
    ),
    mockItems.rating,
  );
```

```python Django
from django.db import connection
from paradedb.indexes import BM25Index
from paradedb.search import Tokenizer

with connection.schema_editor() as schema_editor:
    schema_editor.add_index(
        MockItem,
        BM25Index(
            fields={
                "id": {},
                "description": {
                    "tokenizer": Tokenizer.unicode_words(
                        options={"columnar": True}
                    ),
                },
                "rating": {},
            },
            key_field="id",
            name="search_idx",
        ),
    )
```

```python SQLAlchemy
from sqlalchemy import Index
from paradedb.sqlalchemy import indexing, tokenizer

idx = Index(
    "search_idx",
    indexing.BM25Field(MockItem.id),
    indexing.BM25Field(
        MockItem.description,
        tokenizer=tokenizer.unicode_words(options={"columnar": True}),
    ),
    indexing.BM25Field(MockItem.rating),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

with engine.begin() as conn:
    idx.create(conn)
```

```ruby Rails
ActiveRecord::Base.connection.add_bm25_index(
  :mock_items,
  fields: {
    id: {},
    description: {
      tokenizer: Tokenizer.unicode_words(options: { columnar: true })
    },
    rating: {}
  },
  key_field: :id,
  name: :search_idx
)
```

```cs EF Core
modelBuilder.Entity<MockItem>()
    .HasBm25Index("search_idx", e => e.Id)
    .HasField(e => e.Description, Tokenizer.Unicode(new() { ["columnar"] = true }))
    .HasField(e => e.Rating);
```

</CodeGroup>

<Note>
  The `columnar` option for tokenizers is available in versions `0.22.0` and
  above.
</Note>

Columnar defaults to `false` for all tokenizers besides [literal](/documentation/tokenizers/available-tokenizers/literal) and
[literal normalized](/documentation/tokenizers/available-tokenizers/literal-normalized), which default to
`true` and do not require an explicit setting.

The reason is that tokenized fields can represent large documents and would be expensive to store column-wise,
whereas literal and literal normalized fields are typically single-value and much more compact.

<Note>
The columnar field stores the raw text value regardless of the tokenizer. For example, if `Hello world` is
split into tokens `hello` and `world`, the columnar value remains `Hello world`.

This is important because operations like filtering and sorting require the original field value, not the tokens.

</Note>

<Note>
  Internally, Tantivy refers to columnar fields as fast fields. Our [legacy
  docs](/legacy/indexing/create-index) also refer to these fields as fast.
</Note>