--- title: Tantivy Aggregates --- Tantivy aggregates are a ParadeDB enterprise feature. [Contact us](mailto:sales@paradedb.com) for access. All fields referenced by a Tantivy aggregate JSON string must be indexed as [fast fields](/documentation/indexing/fast_fields). In addition to plain SQL aggregates, ParadeDB also has the ability to compute aggregates over a single BM25 index by accepting JSON query strings. These aggregates can be more performant than plain SQL aggregates over some datasets. ## Syntax `paradedb.aggregate` accepts three arguments: the name of the BM25 index, a full text search query builder function, and a Tantivy aggregate JSON. ```sql SELECT * FROM paradedb.aggregate( '', , '' ); ``` The name of the BM25 index as a string. A full text search query builder function. The aggregate will be computed over the results of this function. A Tantivy aggregate JSON string. See the sections below for how to construct these JSONs. ## Count A count aggregation tallies the number of values for the specified field across all documents. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_total": { "sum": {"field": "rating"} } }' ); ``` The field name to compute the count on. The value to use for documents missing the field. By default, missing values are ignored. ## Average An average aggregation calculates the mean of the specified numeric field values across all documents. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "avg_rating": { "avg": {"field": "rating"} } }' ); ``` The field name to compute the average on. The value to use for documents missing the field. By default, missing values are ignored. ## Sum A sum aggregation computes the total sum of the specified numeric field values across all documents. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_total": { "sum": {"field": "rating"} } }' ); ``` The field name to compute the sum on. The value to use for documents missing the field. By default, missing values are ignored. ## Min A min aggregation finds the smallest value for the specified numeric field across all documents. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "min_rating": { "min": {"field": "rating"} } }' ); ``` The field name to compute the minimum on. The value to use for documents missing the field. By default, missing values are ignored. ## Max A max aggregation finds the largest value for the specified numeric field across all documents. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "max_rating": { "max": {"field": "rating"} } }' ); ``` The field name to compute the maximum on. The value to use for documents missing the field. By default, missing values are ignored. ## Stats A stats aggregation provides a collection of statistical metrics for the specified numeric field, including count, sum, average, min, and max. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_stats": { "stats": {"field": "rating"} } }' ); ``` The field name to compute the stats on. The value to use for documents missing the field. By default, missing values are ignored. ## Percentiles The percentiles aggregation calculates the values below which given percentages of the data fall, providing insights into the distribution of a dataset. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_percentiles": { "percentiles": {"field": "rating"} } }' ); ``` The field name to compute the percentiles on. The percentiles to compute. Whether to return the percentiles as a hash map. The value to use for documents missing the field. By default, missing values are ignored. ## Cardinality A cardinality aggregation estimates the number of unique values in the specified field using the HyperLogLog++ algorithm. This is useful for understanding the uniqueness of values in a large dataset. The cardinality aggregation provides an approximate count, which is accurate within a small error range. This trade-off allows for efficient computation even on very large datasets. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "unique_users": { "cardinality": {"field": "user_id", "missing": "unknown"} } }' ); ``` The field name to compute the cardinality on. The value to use for documents missing the field. By default, missing values are ignored. ## Histogram Histogram is a bucket aggregation where buckets are created dynamically based on a specified interval. Each document value is rounded down to its bucket. For example, if you have a price of 18 and an interval of 5, the document will fall into the bucket with the key 15. The formula used for this is: `((val - offset) / interval).floor() * interval + offset`. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_histogram": { "histogram": {"field": "rating", "interval": 1} } }' ); ``` The field to aggregate on. The interval to chunk your data range. Each bucket spans a value range of [0..interval). Must be a positive value. Shift the grid of buckets by the specified offset. The minimum number of documents in a bucket to be returned. Limits the data range to [min, max] closed interval. Extends the value range of the buckets. Whether to return the buckets as a hash map. Whether the values are normalized to ns for date time values. ## Date Histogram Similar to histogram, but can only be used with datetime types. Currently, only fixed time intervals are supported. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "created_at_histogram": { "date_histogram": {"field": "created_at", "fixed_interval": "1h"} } }' ); ``` The field to aggregate on. The interval to chunk your data range. Each bucket spans a value range of [0..fixed_interval). Accepted values should end in `ms`, `s`, `m`, `h`, or `d`. Shift the grid of buckets by the specified offset. The minimum number of documents in a bucket to be returned. Limits the data range to [min, max] closed interval. Extends the value range of the buckets. Whether to return the buckets as a hash map. ## Range Range allows you to define custom buckets for specific ranges. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "ranges": { "range": {"field": "rating", "ranges": [ { "to": 3.0 }, { "from": 3.0, "to": 7.0 }, { "from": 7.0, "to": 20.0 }, { "from": 20.0 } ]} } }' ); ``` The field to aggregate on. A list of ranges to aggregate on. Whether to return the buckets as a hash map. ## Terms Terms creates a bucket for every unique term and counts the number of occurrences. ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "rating_terms": { "terms": {"field": "rating"} } }' ); ``` The field to aggregate on. The number of terms to return. The number of terms to fetch from each segment. Whether to include the document count error. The minimum number of documents in a term to be returned. The order in which to return the terms. The value to use for documents missing the field. ## Nested Aggregations Buckets can contain sub-aggregations. For example, creating buckets with the range aggregation and then calculating the average on each bucket: ```sql SELECT * FROM paradedb.aggregate( 'search_idx', paradedb.all(), '{ "range_rating": { "range": { "field": "rating", "ranges": [ { "from": 1, "to": 3 }, { "from": 3, "to": 5 } ] }, "aggs": { "average_in_range": { "avg": { "field": "rating"} } } } }' ); ```