--- title: Edge Ngram description: Generates prefix n-grams per word, ideal for search-as-you-type canonical: https://docs.paradedb.com/documentation/tokenizers/available-tokenizers/edge-ngrams --- The edge ngram tokenizer first splits text into words at character-class boundaries, then generates n-grams anchored to the **beginning** of each word. This makes it ideal for "search-as-you-type" functionality, where users find matches as they type partial words. The tokenizer takes two required arguments: the minimum and maximum gram length. For each word, it emits prefix tokens from `min_gram` to `max_gram` characters long (clamped to the word length). Words shorter than `min_gram` are skipped. ```sql CREATE INDEX search_idx ON mock_items USING bm25 (id, (description::pdb.edge_ngram(2,5))) WITH (key_field='id'); ``` To get a feel for this tokenizer, run the following command and replace the text with your own: ```sql SELECT 'Quick Fox'::pdb.edge_ngram(2,5)::text[]; ``` ```ini Expected Response text ----------------------------- {qu,qui,quic,quick,fo,fox} (1 row) ``` ## Token Chars By default, the edge ngram tokenizer treats letters and digits as token content and everything else (spaces, punctuation, symbols) as word delimiters. You can customize this with `token_chars`, which accepts a comma-separated list of character classes: `letter`, `digit`, `whitespace`, `punctuation`, `symbol`. Character classification uses Unicode general categories, matching Elasticsearch's behavior. For example, including `punctuation` keeps hyphens as part of words: ```sql SELECT 'Quick-Fox'::pdb.edge_ngram(2,5,'token_chars=letter,digit,punctuation')::text[]; ``` ```ini Expected Response text ------------------------- {qu,qui,quic,quick} (1 row) ```