--- title: Fuzzy description: Allow for typos in the query string canonical: https://docs.paradedb.com/documentation/full-text/fuzzy --- Fuzziness allows for tokens to be considered a match even if they are not identical, allowing for typos in the query string. While fuzzy matching will work for non-latin characters (Chinese, Japanese, Korean, etc..), it may not give expected results (with large result sets returned) as Levenshtein distance relies on individual character difference. If you need this functionality then please thumbs-up this [issue](https://github.com/paradedb/paradedb/issues/3782), and leave a comment with your use case. ## Overview To add fuzziness to a query, cast it to the `fuzzy(n)` type, where `n` is the [edit distance](#how-it-works). Fuzziness is supported for [match](/documentation/full-text/match) and [term](/documentation/full-text/term) queries. ```sql SQL -- Fuzzy match disjunction SELECT id, description FROM mock_items WHERE description ||| 'runing shose'::pdb.fuzzy(2) LIMIT 5; -- Fuzzy match conjunction SELECT id, description FROM mock_items WHERE description &&& 'runing shose'::pdb.fuzzy(2) LIMIT 5; -- Fuzzy Term SELECT id, description FROM mock_items WHERE description === 'shose'::pdb.fuzzy(2) LIMIT 5; ``` ```python Django from paradedb import Match, ParadeDB, Term # Fuzzy match disjunction MockItem.objects.filter( description=ParadeDB(Match('runing shose', operator='OR', distance=2)) ).values('id', 'description')[:5] # Fuzzy match conjunction MockItem.objects.filter( description=ParadeDB(Match('runing shose', operator='AND', distance=2)) ).values('id', 'description')[:5] # Fuzzy term MockItem.objects.filter( description=ParadeDB(Term('shose', distance=2)) ).values('id', 'description')[:5] ``` ```ruby Rails # Fuzzy match disjunction MockItem.search(:description) .matching_any('runing shose', distance: 2) .select(:id, :description) .limit(5) # Fuzzy match conjunction MockItem.search(:description) .matching_all('runing shose', distance: 2) .select(:id, :description) .limit(5) # Fuzzy term MockItem.search(:description) .term("shose", distance: 2) .select(:id, :description) .limit(5) ``` ## How It Works By default, the [match](/documentation/full-text/match) and [term](/documentation/full-text/term) queries require exact token matches between the query and indexed text. When a query is cast to `fuzzy(n)`, this requirement is relaxed -- tokens are matched if their Levenshtein distance, or edit distance, is less than or equal to `n`. Edit distance is a measure of how many single-character operations are needed to turn one string into another. The allowed operations are: - **Insertion** adds a character e.g., "shoe" → "shoes" (insert "s") has an edit distance of `1` - **Deletion** removes a character e.g. "runnning" → "running" (delete one "n") has an edit distance of `1` - **Transposition** replaces on character with another e.g., "shose" → "shoes" (transpose "s" → "e") has an edit distance of `2` For performance reasons, the maximum allowed edit distance is `2`. Casting a query to `fuzzy(0)` is the same as an exact token match. ## Fuzzy Prefix `fuzzy` also supports prefix matching. For instance, "runn" is a prefix of "running" because it matches the beginning of the token exactly. "rann" is a **fuzzy prefix** of "running" because it matches the beginning within an edit distance of `1`. To treat the query string as a prefix, set the second argument of `fuzzy` to either `t` or `"true"`: ```sql SQL SELECT id, description FROM mock_items WHERE description === 'rann'::pdb.fuzzy(1, t) LIMIT 5; ``` ```python Django from paradedb import ParadeDB, Term MockItem.objects.filter( description=ParadeDB(Term('rann', distance=1, prefix=True)) ).values('id', 'description')[:5] ``` ```ruby Rails MockItem.search(:description) .term("rann", distance: 1, prefix: true) .select(:id, :description) .limit(5) ``` Postgres requires that `true` be double-quoted, i.e. `fuzzy(1, "true")`. When used with [match](/documentation/full-text/match) queries, fuzzy prefix treats all tokens in the query string as prefixes. For instance, the following query means "find all documents containing the fuzzy prefix `rann` AND the fuzzy prefix `slee`": ```sql SQL SELECT id, description FROM mock_items WHERE description &&& 'slee rann'::pdb.fuzzy(1, t) LIMIT 5; ``` ```python Django from paradedb import Match, ParadeDB MockItem.objects.filter( description=ParadeDB(Match('slee rann', operator='AND', distance=1, prefix=True)) ).values('id', 'description')[:5] ``` ```ruby Rails MockItem.search(:description) .matching_all("slee rann", distance: 1, prefix: true) .select(:id, :description) .limit(5) ``` ## Transposition Cost By default, the cost of a transposition (i.e. "shose" → "shoes") is `2`. Setting the third argument of `fuzzy` to `t` lowers the cost of a transposition to `1`: ```sql SQL SELECT id, description FROM mock_items WHERE description === 'shose'::pdb.fuzzy(1, f, t) LIMIT 5; ``` ```python Django from paradedb import ParadeDB, Term MockItem.objects.filter( description=ParadeDB(Term('shose', distance=1, transposition_cost_one=True)) ).values('id', 'description')[:5] ``` ```ruby Rails MockItem.search(:description) .term("shose", distance: 1, transposition_cost_one: true) .select(:id, :description) .limit(5) ``` The default value for the second and third arguments of `fuzzy` is `f`, which means `fuzzy(1)` is equivalent to `fuzzy(1, f, f)`.