--- title: How Tokenizers Work --- Tokenizers split large chunks of text into small, searchable units called tokens. Different tokenizers have different strategies for how to split text. The default tokenizer in ParadeDB is the [simple tokenizer](/v2/tokenizers/available_tokenizers/simple). It splits text on whitespace, punctuation, and also [lowercases](/v2/token_filters/lowercase). To visualize how this tokenizer works, you can cast a text string to the tokenizer type, and then to `text[]`: ```sql SELECT 'Hello world!'::pdb.simple::text[]; ``` ```ini Expected Response text --------------- {hello,world} (1 row) ``` On the other hand, the [ngrams](/v2/tokenizers/available_tokenizers/ngrams) tokenizer splits text into "grams" of size `n`. In this example, `n = 3`: ```sql SELECT 'Hello world!'::pdb.ngram(3,3)::text[]; ``` ```ini Expected Response text ------------------------------------------------- {hel,ell,llo,"lo ","o w"," wo",wor,orl,rld,ld!} (1 row) ``` Choosing the right tokenizer is crucial to getting the search results you want. For instance, the simple tokenizer works best for whole-word matching like "hello" or "world", while the ngram tokenizer enables partial matching. To configure a tokenizer for a column in the index, simply cast it to the desired tokenizer type: ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, (description::pdb.ngram(3,3))) WITH (key_field='id'); ```