Search Shortcut cmd + k | ctrl + k
Search cmd+k ctrl+k
1.0 (stable)
Aggregate Functions

## Examples

Produce a single row containing the sum of the `amount` column:

``````SELECT sum(amount)
FROM sales;
``````

Produce one row per unique region, containing the sum of `amount` for each group:

``````SELECT region, sum(amount)
FROM sales
GROUP BY region;
``````

Return only the regions that have a sum of `amount` higher than 100:

``````SELECT region
FROM sales
GROUP BY region
HAVING sum(amount) > 100;
``````

Return the number of unique values in the `region` column:

``````SELECT count(DISTINCT region)
FROM sales;
``````

Return two values, the total sum of `amount` and the sum of `amount` minus columns where the region is `north` using the `FILTER` clause:

``````SELECT sum(amount), sum(amount) FILTER (region != 'north')
FROM sales;
``````

Returns a list of all regions in order of the `amount` column:

``````SELECT list(region ORDER BY amount DESC)
FROM sales;
``````

Returns the amount of the first sale using the `first()` aggregate function:

``````SELECT first(amount ORDER BY date ASC)
FROM sales;
``````

## Syntax

Aggregates are functions that combine multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the `SELECT` and `HAVING` clauses of a SQL query.

### `DISTINCT` Clause in Aggregate Functions

When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `count` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.

### `ORDER BY` Clause in Aggregate Functions

An `ORDER BY` clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause.

``````SELECT ⟨aggregate_function⟩(⟨arg⟩, ⟨sep⟩ ORDER BY ⟨ordering_criteria⟩);
``````

This clause ensures that the values being aggregated are sorted before applying the function. Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded. However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., `first`, `last`, `list` and `string_agg` / `group_concat` / `listagg`. These can be made deterministic by ordering the arguments.

For example:

``````CREATE TABLE tbl AS
SELECT s FROM range(1, 4) r(s);

SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown
FROM tbl;
``````
countdown
3, 2, 1

## General Aggregate Functions

The table below shows the available general aggregate functions.

Function Description
`any_value(arg)` Returns the first non-null value from `arg`. This function is affected by ordering.
`arbitrary(arg)` Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
`arg_max(arg, val)` Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`arg_min(arg, val)` Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`array_agg(arg)` Returns a `LIST` containing all the values of a column. This function is affected by ordering.
`avg(arg)` Calculates the average value for all tuples in `arg`.
`bit_and(arg)` Returns the bitwise AND of all bits in a given expression.
`bit_or(arg)` Returns the bitwise OR of all bits in a given expression.
`bit_xor(arg)` Returns the bitwise XOR of all bits in a given expression.
`bitstring_agg(arg)` Returns a bitstring with bits set for each distinct value.
`bool_and(arg)` Returns `true` if every input value is `true`, otherwise `false`.
`bool_or(arg)` Returns `true` if any input value is `true`, otherwise `false`.
`count(arg)` Calculates the number of tuples in `arg`.
`favg(arg)` Calculates the average using a more accurate floating point summation (Kahan Sum).
`first(arg)` Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
`fsum(arg)` Calculates the sum using a more accurate floating point summation (Kahan Sum).
`geomean(arg)` Calculates the geometric mean for all tuples in `arg`.
`histogram(arg)` Returns a `MAP` of key-value pairs representing buckets and counts.
`last(arg)` Returns the last value of a column. This function is affected by ordering.
`list(arg)` Returns a `LIST` containing all the values of a column. This function is affected by ordering.
`max(arg)` Returns the maximum value present in `arg`.
`max_by(arg, val)` Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`min(arg)` Returns the minimum value present in `arg`.
`min_by(arg, val)` Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`product(arg)` Calculates the product of all tuples in `arg`.
`string_agg(arg, sep)` Concatenates the column string values with a separator. This function is affected by ordering.
`sum(arg)` Calculates the sum value for all tuples in `arg`.
`sum_no_overflow(arg)` Calculates the sum value for all tuples in `arg` without overflow checks. Unlike `sum`, which works on floating-point values, `sum_no_overflow` only accepts `INTEGER` and `DECIMAL` values.

### `any_value(arg)`

 Description Returns the first non-null value from `arg`. This function is affected by ordering. Example `any_value(A)` Alias(es) -

### `arbitrary(arg)`

 Description Returns the first value (null or non-null) from `arg`. This function is affected by ordering. Example `arbitrary(A)` Alias(es) `first(A)`

### `arg_max(arg, val)`

 Description Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering. Example `arg_max(A, B)` Alias(es) `argMax(arg, val)`, `max_by(arg, val)`

### `arg_min(arg, val)`

 Description Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering. Example `arg_min(A, B)` Alias(es) `argMin(arg, val)`, `min_by(arg, val)`

### `array_agg(arg)`

 Description Returns a `LIST` containing all the values of a column. This function is affected by ordering. Example `array_agg(A)` Alias(es) `list`

### `avg(arg)`

 Description Calculates the average value for all tuples in `arg`. Example `avg(A)` Alias(es) `mean`

### `bit_and(arg)`

 Description Returns the bitwise `AND` of all bits in a given expression. Example `bit_and(A)` Alias(es) -

### `bit_or(arg)`

 Description Returns the bitwise `OR` of all bits in a given expression. Example `bit_or(A)` Alias(es) -

### `bit_xor(arg)`

 Description Returns the bitwise `XOR` of all bits in a given expression. Example `bit_xor(A)` Alias(es) -

### `bitstring_agg(arg)`

 Description Returns a bitstring with bits set for each distinct value. Example `bitstring_agg(A)` Alias(es) -

### `bool_and(arg)`

 Description Returns `true` if every input value is `true`, otherwise `false`. Example `bool_and(A)` Alias(es) -

### `bool_or(arg)`

 Description Returns `true` if any input value is `true`, otherwise `false`. Example `bool_or(A)` Alias(es) -

### `count(arg)`

 Description Calculates the number of tuples in `arg`. If no `arg` is provided, the expression is evaluated as `count(*)`. Example `count(A)` Alias(es) -

### `favg(arg)`

 Description Calculates the average using a more accurate floating point summation (Kahan Sum). Example `favg(A)` Alias(es) -

### `first(arg)`

 Description Returns the first value (null or non-null) from `arg`. This function is affected by ordering. Example `first(A)` Alias(es) `arbitrary(A)`

### `fsum(arg)`

 Description Calculates the sum using a more accurate floating point summation (Kahan Sum). Example `fsum(A)` Alias(es) `sumKahan`, `kahan_sum`

### `geomean(arg)`

 Description Calculates the geometric mean for all tuples in `arg`. Example `geomean(A)` Alias(es) `geometric_mean(A)`

### `histogram(arg)`

 Description Returns a `MAP` of key-value pairs representing buckets and counts. Example `histogram(A)` Alias(es) -

### `last(arg)`

 Description Returns the last value of a column. This function is affected by ordering. Example `last(A)` Alias(es) -

### `list(arg)`

 Description Returns a `LIST` containing all the values of a column. This function is affected by ordering. Example `list(A)` Alias(es) `array_agg`

### `max(arg)`

 Description Returns the maximum value present in `arg`. Example `max(A)` Alias(es) -

### `max_by(arg, val)`

 Description Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering. Example `max_by(A, B)` Alias(es) `argMax(arg, val)`, `arg_max(arg, val)`

### `min(arg)`

 Description Returns the minimum value present in `arg`. Example `min(A)` Alias(es) -

### `min_by(arg, val)`

 Description Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering. Example `min_by(A, B)` Alias(es) `argMin(arg, val)`, `arg_min(arg, val)`

### `product(arg)`

 Description Calculates the product of all tuples in `arg`. Example `product(A)` Alias(es) -

### `string_agg(arg, sep)`

 Description Concatenates the column string values with a separator. This function is affected by ordering. Example `string_agg(S, ',')` Alias(es) `group_concat(arg, sep)`, `listagg(arg, sep)`

### `sum(arg)`

 Description Calculates the sum value for all tuples in `arg`. Example `sum(A)` Alias(es) -

### `sum_no_overflow(arg)`

 Description Calculates the sum value for all tuples in `arg` without overflow checks. Unlike `sum`, which works on floating-point values, `sum_no_overflow` only accepts `INTEGER` and `DECIMAL` values. Example `sum_no_overflow(A)` Alias(es) -

## Approximate Aggregates

The table below shows the available approximate aggregate functions.

Function Description Example
`approx_count_distinct(x)` Gives the approximate count of distinct elements using HyperLogLog. `approx_count_distinct(A)`
`approx_quantile(x, pos)` Gives the approximate quantile using T-Digest. `approx_quantile(A, 0.5)`
`reservoir_quantile(x, quantile, sample_size = 8192)` Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size. `reservoir_quantile(A, 0.5, 1024)`

## Statistical Aggregates

The table below shows the available statistical aggregate functions. They all ignore `NULL` values (in the case of a single input column `x`), or pairs where either input is `NULL` (in the case of two input columns `y` and `x`).

Function Description
`corr(y, x)` The correlation coefficient.
`covar_pop(y, x)` The population covariance, which does not include bias correction.
`covar_samp(y, x)` The sample covariance, which includes Bessel's bias correction.
`entropy(x)` The log-2 entropy.
`kurtosis_pop(x)` The excess kurtosis (Fisher’s definition) without bias correction.
`kurtosis(x)` The excess kurtosis (Fisher's definition) with bias correction according to the sample size.
`mad(x)` The median absolute deviation. Temporal types return a positive `INTERVAL`.
`median(x)` The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value.
`mode(x)` The most frequent value.
`quantile_cont(x, pos)` The interpolated `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `pos * (n_nonnull_values - 1)`th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles.
`quantile_disc(x, pos)` The discrete `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `floor(pos * (n_nonnull_values - 1))`th (zero-indexed) element. If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles.
`regr_avgx(y, x)` The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable.
`regr_avgy(y, x)` The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable.
`regr_count(y, x)` The number of non-`NULL` pairs.
`regr_intercept(y, x)` The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable.
`regr_r2(y, x)` The coefficient of determination, where x is the independent variable and y is the dependent variable.
`regr_slope(y, x)` The slope of the linear regression line, where x is the independent variable and y is the dependent variable.
`regr_sxx(y, x)` The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable.
`regr_sxy(y, x)` The sample covariance, which includes Bessel's bias correction.
`regr_syy(y, x)` The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs , where x is the independent variable and y is the dependent variable.
`skewness(x)` The skewness.
`stddev_pop(x)` The population standard deviation.
`stddev_samp(x)` The sample standard deviation.
`var_pop(x)` The population variance, which does not include bias correction.
`var_samp(x)` The sample variance, which includes Bessel's bias correction.

### `corr(y, x)`

 Description The correlation coefficient. Formula `covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y))` Alias(es) -

### `covar_pop(y, x)`

 Description The population covariance, which does not include bias correction. Formula `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / regr_count(y, x)`, `covar_samp(y, x) * (1 - 1 / regr_count(y, x))` Alias(es) -

### `covar_samp(y, x)`

 Description The sample covariance, which includes Bessel's bias correction. Formula `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1)`, `covar_pop(y, x) / (1 - 1 / regr_count(y, x))` Alias(es) `regr_sxy(y, x)`

### `entropy(x)`

 Description The log-2 entropy. Formula - Alias(es) -

### `kurtosis_pop(x)`

 Description The excess kurtosis (Fisher’s definition) without bias correction. Formula - Alias(es) -

### `kurtosis(x)`

 Description The excess kurtosis (Fisher's definition) with bias correction according to the sample size. Formula - Alias(es) -

### `mad(x)`

 Description The median absolute deviation. Temporal types return a positive `INTERVAL`. Formula `median(abs(x - median(x)))` Alias(es) -

### `median(x)`

 Description The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. Formula `quantile_cont(x, 0.5)` Alias(es) -

### `mode(x)`

 Description The most frequent value. Formula - Alias(es) -

### `quantile_cont(x, pos)`

 Description The interpolated `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `pos * (n_nonnull_values - 1)`th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles. Formula - Alias(es) -

### `quantile_disc(x, pos)`

 Description The discrete `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `floor(pos * (n_nonnull_values - 1))`th (zero-indexed) element. If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles. Formula - Alias(es) `quantile`

### `regr_avgx(y, x)`

 Description The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. Formula - Alias(es) -

### `regr_avgy(y, x)`

 Description The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. Formula - Alias(es) -

### `regr_count(y, x)`

 Description The number of non-`NULL` pairs. Formula - Alias(es) -

### `regr_intercept(y, x)`

 Description The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. Formula `regr_avgy(y, x) - regr_slope(y, x) * regr_avgx(y, x)` Alias(es) -

### `regr_r2(y, x)`

 Description The coefficient of determination, where x is the independent variable and y is the dependent variable. Formula - Alias(es) -

### `regr_slope(y, x)`

 Description Returns the slope of the linear regression line, where x is the independent variable and y is the dependent variable. Formula `regr_sxy(y, x) / regr_sxx(y, x)` Alias(es) -

### `regr_sxx(y, x)`

 Description The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. Formula - Alias(es) -

### `regr_sxy(y, x)`

 Description The sample covariance, which includes Bessel's bias correction. Formula - Alias(es) -

### `regr_syy(y, x)`

 Description The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. Formula - Alias(es) -

### `skewness(x)`

 Description The skewness. Formula - Alias(es) -

### `stddev_pop(x)`

 Description The population standard deviation. Formula `sqrt(var_pop(x))` Alias(es) -

### `stddev_samp(x)`

 Description The sample standard deviation. Formula `sqrt(var_samp(x))` Alias(es) `stddev(x)`

### `var_pop(x)`

 Description The population variance, which does not include bias correction. Formula `(sum(x^2) - sum(x)^2 / count(x)) / count(x)`, `var_samp(y, x) * (1 - 1 / count(x))` Alias(es) -

### `var_samp(x)`

 Description The sample variance, which includes Bessel's bias correction. Formula `(sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1)`, `var_pop(y, x) / (1 - 1 / count(x))` Alias(es) `variance(arg, val)`

## Ordered Set Aggregate Functions

The table below shows the available "ordered set" aggregate functions. These functions are specified using the `WITHIN GROUP (ORDER BY sort_expression)` syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the first argument.

Function Equivalent
`mode() WITHIN GROUP (ORDER BY column [(ASC|DESC)])` `mode(column ORDER BY column [(ASC|DESC)])`
`percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)])` `quantile_cont(column, fraction ORDER BY column [(ASC|DESC)])`
`percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)])` `quantile_cont(column, fractions ORDER BY column [(ASC|DESC)])`
`percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)])` `quantile_disc(column, fraction ORDER BY column [(ASC|DESC)])`
`percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)])` `quantile_disc(column, fractions ORDER BY column [(ASC|DESC)])`

## Miscellaneous Aggregate Functions

Function Description Alias
`grouping()` For queries with `GROUP BY` and either `ROLLUP` or `GROUPING SETS`: Returns an integer identifying which of the argument expressions where used to group on to create the current supper-aggregate row. `grouping_id()`