Documentation
/ SQL
/ Functions
Aggregate Functions
Produce a single row containing the sum of the amount
column:
SELECT sum(amount)
FROM sales;
Produce one row per unique region, containing the sum of amount
for each group:
SELECT region, sum(amount)
FROM sales
GROUP BY region;
Return only the regions that have a sum of amount
higher than 100:
SELECT region
FROM sales
GROUP BY region
HAVING sum(amount) > 100;
Return the number of unique values in the region
column:
SELECT count(DISTINCT region)
FROM sales;
Return two values, the total sum of amount
and the sum of amount
minus columns where the region is north
using the FILTER
clause:
SELECT sum(amount), sum(amount) FILTER (region != 'north')
FROM sales;
Returns a list of all regions in order of the amount
column:
SELECT list(region ORDER BY amount DESC)
FROM sales;
Returns the amount of the first sale using the first()
aggregate function:
SELECT first(amount ORDER BY date ASC)
FROM sales;
Aggregates are functions that combine multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the SELECT
and HAVING
clauses of a SQL query.
When the DISTINCT
clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the count
aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.
An ORDER BY
clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause.
SELECT ⟨aggregate_function⟩(⟨arg⟩, ⟨sep⟩ ORDER BY ⟨ordering_criteria⟩);
This clause ensures that the values being aggregated are sorted before applying the function.
Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded.
However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., first
, last
, list
and string_agg
/ group_concat
/ listagg
.
These can be made deterministic by ordering the arguments.
For example:
CREATE TABLE tbl AS
SELECT s FROM range(1, 4) r(s);
SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown
FROM tbl;
The table below shows the available general aggregate functions.
Function |
Description |
any_value(arg) |
Returns the first non-null value from arg . This function is affected by ordering. |
arbitrary(arg) |
Returns the first value (null or non-null) from arg . This function is affected by ordering. |
arg_max(arg, val) |
Finds the row with the maximum val and calculates the arg expression at that row. Rows where the value of the arg or val expression is NULL are ignored. This function is affected by ordering. |
arg_max(arg, val, n) |
The generalized case of arg_max for n values: returns a LIST containing the arg expressions for the top n rows ordered by val descending. This function is affected by ordering. |
arg_max_null(arg, val) |
Finds the row with the maximum val and calculates the arg expression at that row. Rows where the val expression evaluates to NULL are ignored. This function is affected by ordering. |
arg_min(arg, val) |
Finds the row with the minimum val and calculates the arg expression at that row. Rows where the value of the arg or val expression is NULL are ignored. This function is affected by ordering. |
arg_min(arg, val, n) |
Returns a LIST containing the arg expressions for the "bottom" n rows ordered by val ascending. This function is affected by ordering. |
arg_min_null(arg, val) |
Finds the row with the minimum val and calculates the arg expression at that row. Rows where the val expression evaluates to NULL are ignored. This function is affected by ordering. |
array_agg(arg) |
Returns a LIST containing all the values of a column. This function is affected by ordering. |
avg(arg) |
Calculates the average of all non-null values in arg . |
bit_and(arg) |
Returns the bitwise AND of all bits in a given expression. |
bit_or(arg) |
Returns the bitwise OR of all bits in a given expression. |
bit_xor(arg) |
Returns the bitwise XOR of all bits in a given expression. |
bitstring_agg(arg) |
Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. |
bool_and(arg) |
Returns true if every input value is true , otherwise false . |
bool_or(arg) |
Returns true if any input value is true , otherwise false . |
count() |
Returns the number of rows in a group. |
count(arg) |
Returns the number of non-null values in arg . |
favg(arg) |
Calculates the average using a more accurate floating point summation (Kahan Sum). |
first(arg) |
Returns the first value (null or non-null) from arg . This function is affected by ordering. |
fsum(arg) |
Calculates the sum using a more accurate floating point summation (Kahan Sum). |
geomean(arg) |
Calculates the geometric mean of all non-null values in arg . |
histogram(arg) |
Returns a MAP of key-value pairs representing buckets and counts. |
histogram(arg, boundaries) |
Returns a MAP of key-value pairs representing the provided upper boundaries and counts of elements in the corresponding left-open and right-closed partition of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided boundaries appear, see is_histogram_other_bin . Boundaries may be provided, e.g., via equi_width_bins . |
histogram_exact(arg, elements) |
Returns a MAP of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see is_histogram_other_bin . |
last(arg) |
Returns the last value of a column. This function is affected by ordering. |
list(arg) |
Returns a LIST containing all the values of a column. This function is affected by ordering. |
max(arg) |
Returns the maximum value present in arg . |
max(arg, n) |
Returns a LIST containing the arg values for the "top" n rows ordered by arg descending. |
max_by(arg, val) |
Finds the row with the maximum val . Calculates the arg expression at that row. This function is affected by ordering. |
max_by(arg, val, n) |
Returns a LIST containing the arg expressions for the "top" n rows ordered by val descending. |
min(arg) |
Returns the minimum value present in arg . |
min(arg, n) |
Returns a LIST containing the arg values for the "bottom" n rows ordered by arg ascending. |
min_by(arg, val) |
Finds the row with the minimum val . Calculates the arg expression at that row. This function is affected by ordering. |
min_by(arg, val, n) |
Returns a LIST containing the arg expressions for the "bottom" n rows ordered by val ascending. |
product(arg) |
Calculates the product of all non-null values in arg . |
string_agg(arg, sep) |
Concatenates the column string values with a separator. This function is affected by ordering. |
sum(arg) |
Calculates the sum of all non-null values in arg . |
Description |
Returns the first non-NULL value from arg . This function is affected by ordering. |
Example |
any_value(A) |
Alias(es) |
- |
Description |
Returns the first value (NULL or non-NULL ) from arg . This function is affected by ordering. |
Example |
arbitrary(A) |
Alias(es) |
first(A) |
Description |
Finds the row with the maximum val and calculates the arg expression at that row. Rows where the value of the arg or val expression is NULL are ignored. This function is affected by ordering. |
Example |
arg_max(A, B) |
Alias(es) |
argMax(arg, val) , max_by(arg, val) |
Description |
The generalized case of arg_max for n values: returns a LIST containing the arg expressions for the top n rows ordered by val descending. This function is affected by ordering. |
Example |
arg_max(A, B, 2) |
Alias(es) |
argMax(arg, val, n) , max_by(arg, val, n) |
Description |
Finds the row with the maximum val and calculates the arg expression at that row. Rows where the val expression evaluates to NULL are ignored. This function is affected by ordering. |
Example |
arg_max_null(A, B) |
Alias(es) |
- |
Description |
Finds the row with the minimum val and calculates the arg expression at that row. Rows where the value of the arg or val expression is NULL are ignored. This function is affected by ordering. |
Example |
arg_min(A, B) |
Alias(es) |
argmin(arg, val) , min_by(arg, val) |
Description |
The generalized case of arg_min for n values: returns a LIST containing the arg expressions for the top n rows ordered by val descending. This function is affected by ordering. |
Example |
arg_min(A, B, 2) |
Alias(es) |
argmin(arg, val, n) , min_by(arg, val, n) |
Description |
Finds the row with the minimum val and calculates the arg expression at that row. Rows where the val expression evaluates to NULL are ignored. This function is affected by ordering. |
Example |
arg_min_null(A, B) |
Alias(es) |
- |
Description |
Returns a LIST containing all the values of a column. This function is affected by ordering. |
Example |
array_agg(A) |
Alias(es) |
list |
Description |
Calculates the average of all non-null values in arg . |
Example |
avg(A) |
Alias(es) |
mean |
Description |
Returns the bitwise AND of all bits in a given expression. |
Example |
bit_and(A) |
Alias(es) |
- |
Description |
Returns the bitwise OR of all bits in a given expression. |
Example |
bit_or(A) |
Alias(es) |
- |
Description |
Returns the bitwise XOR of all bits in a given expression. |
Example |
bit_xor(A) |
Alias(es) |
- |
Description |
Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. |
Example |
bitstring_agg(A) |
Alias(es) |
- |
Description |
Returns true if every input value is true , otherwise false . |
Example |
bool_and(A) |
Alias(es) |
- |
Description |
Returns true if any input value is true , otherwise false . |
Example |
bool_or(A) |
Alias(es) |
- |
Description |
Returns the number of rows in a group. |
Example |
count() |
Alias(es) |
count(*) |
Description |
Returns the number of non-null values in arg . |
Example |
count(A) |
Alias(es) |
- |
Description |
Calculates the average using a more accurate floating point summation (Kahan Sum). |
Example |
favg(A) |
Alias(es) |
- |
Description |
Returns the first value (null or non-null) from arg . This function is affected by ordering. |
Example |
first(A) |
Alias(es) |
arbitrary(A) |
Description |
Calculates the sum using a more accurate floating point summation (Kahan Sum). |
Example |
fsum(A) |
Alias(es) |
sumKahan , kahan_sum |
Description |
Calculates the geometric mean of all non-null values in arg . |
Example |
geomean(A) |
Alias(es) |
geometric_mean(A) |
Description |
Returns a MAP of key-value pairs representing buckets and counts. |
Example |
histogram(A) |
Alias(es) |
- |
Description |
Returns a MAP of key-value pairs representing the provided upper boundaries and counts of elements in the corresponding left-open and right-closed partition of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided boundaries appear, see is_histogram_other_bin . Boundaries may be provided, e.g., via equi_width_bins . |
Example |
histogram(A, [0, 1, 10]) |
Alias(es) |
- |
Description |
Returns a MAP of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see is_histogram_other_bin . |
Example |
histogram_exact(A, [0, 1, 10]) |
Alias(es) |
- |
Description |
Returns the last value of a column. This function is affected by ordering. |
Example |
last(A) |
Alias(es) |
- |
Description |
Returns a LIST containing all the values of a column. This function is affected by ordering. |
Example |
list(A) |
Alias(es) |
array_agg |
Description |
Returns the maximum value present in arg . |
Example |
max(A) |
Alias(es) |
- |
Description |
Returns a LIST containing the arg values for the "top" n rows ordered by arg descending. |
Example |
max(A, 2) |
Alias(es) |
- |
Description |
Finds the row with the maximum val . Calculates the arg expression at that row. This function is affected by ordering. |
Example |
max_by(A, B) |
Alias(es) |
argMax(arg, val) , arg_max(arg, val) |
Description |
Returns a LIST containing the arg expressions for the "top" n rows ordered by val descending. |
Example |
max_by_n(A, B, 2) |
Alias(es) |
argMax(arg, val, n) , arg_max(arg, val, n) |
Description |
Returns the minimum value present in arg . |
Example |
min(A) |
Alias(es) |
- |
Description |
Returns a LIST containing the arg values for the "bottom" n rows ordered by arg ascending. |
Example |
min(A, 2) |
Alias(es) |
- |
Description |
Finds the row with the minimum val . Calculates the arg expression at that row. This function is affected by ordering. |
Example |
min_by(A, B) |
Alias(es) |
argMin(arg, val) , arg_min(arg, val) |
Description |
Returns a LIST containing the arg expressions for the "bottom" n rows ordered by val ascending. |
Example |
min_by(A, B, 2) |
Alias(es) |
argMin(arg, val, n) , arg_min(arg, val, n) |
Description |
Calculates the product of all non-null values in arg . |
Example |
product(A) |
Alias(es) |
- |
Description |
Concatenates the column string values with a separator. This function is affected by ordering. |
Example |
string_agg(S, ',') |
Alias(es) |
group_concat(arg, sep) , listagg(arg, sep) |
Description |
Calculates the sum of all non-null values in arg . |
Example |
sum(A) |
Alias(es) |
- |
All general aggregate functions except for list
and first
(and their aliases array_agg
and arbitrary
, respectively) ignore NULL
s.
To exclude NULL
s from list
, you can use a FILTER
clause.
To ignore NULL
s from first
, you can use the any_value
aggregate.
All general aggregate functions except count
return NULL
on empty groups and groups without non-NULL
inputs.
In particular, list
does not return an empty list, sum
does not return zero, and string_agg
does not return an empty string in this case.
The table below shows the available approximate aggregate functions.
Function |
Description |
Example |
approx_count_distinct(x) |
Gives the approximate count of distinct elements using HyperLogLog. |
approx_count_distinct(A) |
approx_quantile(x, pos) |
Gives the approximate quantile using T-Digest. |
approx_quantile(A, 0.5) |
reservoir_quantile(x, quantile, sample_size = 8192) |
Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size. |
reservoir_quantile(A, 0.5, 1024) |
The table below shows the available statistical aggregate functions.
They all ignore NULL
values (in the case of a single input column x
), or pairs where either input is NULL
(in the case of two input columns y
and x
).
Function |
Description |
corr(y, x) |
The correlation coefficient. |
covar_pop(y, x) |
The population covariance, which does not include bias correction. |
covar_samp(y, x) |
The sample covariance, which includes Bessel's bias correction. |
entropy(x) |
The log-2 entropy. |
kurtosis_pop(x) |
The excess kurtosis (Fisher’s definition) without bias correction. |
kurtosis(x) |
The excess kurtosis (Fisher's definition) with bias correction according to the sample size. |
mad(x) |
The median absolute deviation. Temporal types return a positive INTERVAL . |
median(x) |
The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. |
mode(x) |
The most frequent value. This function is affected by ordering. |
quantile_cont(x, pos) |
The interpolated pos -quantile of x for 0 <= pos <= 1 . Returns the pos * (n_nonnull_values - 1) th (zero-indexed, in the specified order) value of x or an interpolation between the adjacent values if the index is not an integer. Intuitively, arranges the values of x as equispaced points on a line, starting at 0 and ending at 1, and returns the (interpolated) value at pos . If pos is a LIST of FLOAT s, then the result is a LIST of the corresponding interpolated quantiles. |
quantile_disc(x, pos) |
The discrete pos -quantile of x for 0 <= pos <= 1 . Returns the greatest(ceil(pos * n_nonnull_values) - 1, 0) th (zero-indexed, in the specified order) value of x . Intuitively, assigns to each value of x an equisized sub-interval (left-open and right-closed except for the initial interval) of the interval [0, 1] , and picks the value of the sub-interval that contains pos . If pos is a LIST of FLOAT s, then the result is a LIST of the corresponding discrete quantiles. |
regr_avgx(y, x) |
The average of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
regr_avgy(y, x) |
The average of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
regr_count(y, x) |
The number of non-NULL pairs. |
regr_intercept(y, x) |
The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. |
regr_r2(y, x) |
The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. |
regr_slope(y, x) |
The slope of the linear regression line, where x is the independent variable and y is the dependent variable. |
regr_sxx(y, x) |
The population variance, which includes Bessel's bias correction, of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
regr_sxy(y, x) |
The population covariance, which includes Bessel's bias correction. |
regr_syy(y, x) |
The population variance, which includes Bessel's bias correction, of the dependent variable for non-NULL pairs , where x is the independent variable and y is the dependent variable. |
skewness(x) |
The skewness. |
stddev_pop(x) |
The population standard deviation. |
stddev_samp(x) |
The sample standard deviation. |
var_pop(x) |
The population variance, which does not include bias correction. |
var_samp(x) |
The sample variance, which includes Bessel's bias correction. |
Description |
The correlation coefficient. |
Formula |
covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y)) |
Alias(es) |
- |
Description |
The population covariance, which does not include bias correction. |
Formula |
(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / regr_count(y, x) , covar_samp(y, x) * (1 - 1 / regr_count(y, x)) |
Alias(es) |
- |
Description |
The sample covariance, which includes Bessel's bias correction. |
Formula |
(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1) , covar_pop(y, x) / (1 - 1 / regr_count(y, x)) |
Alias(es) |
regr_sxy(y, x) |
Description |
The log-2 entropy. |
Formula |
- |
Alias(es) |
- |
Description |
The excess kurtosis (Fisher’s definition) without bias correction. |
Formula |
- |
Alias(es) |
- |
Description |
The excess kurtosis (Fisher's definition) with bias correction according to the sample size. |
Formula |
- |
Alias(es) |
- |
Description |
The median absolute deviation. Temporal types return a positive INTERVAL . |
Formula |
median(abs(x - median(x))) |
Alias(es) |
- |
Description |
The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. |
Formula |
quantile_cont(x, 0.5) |
Alias(es) |
- |
Description |
The interpolated pos -quantile of x for 0 <= pos <= 1 . Returns the pos * (n_nonnull_values - 1) th (zero-indexed, in the specified order) value of x or an interpolation between the adjacent values if the index is not an integer. Intuitively, arranges the values of x as equispaced points on a line, starting at 0 and ending at 1, and returns the (interpolated) value at pos . If pos is a LIST of FLOAT s, then the result is a LIST of the corresponding interpolated quantiles. |
Formula |
- |
Alias(es) |
- |
Description |
The discrete pos -quantile of x for 0 <= pos <= 1 . Returns the greatest(ceil(pos * n_nonnull_values) - 1, 0) th (zero-indexed, in the specified order) value of x . Intuitively, assigns to each value of x an equisized sub-interval (left-open and right-closed except for the initial interval) of the interval [0, 1] , and picks the value of the sub-interval that contains pos . If pos is a LIST of FLOAT s, then the result is a LIST of the corresponding discrete quantiles. |
Formula |
- |
Alias(es) |
quantile |
Description |
The average of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
Formula |
- |
Alias(es) |
- |
Description |
The average of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
Formula |
- |
Alias(es) |
- |
Description |
The number of non-NULL pairs. |
Formula |
- |
Alias(es) |
- |
Description |
The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. |
Formula |
regr_avgy(y, x) - regr_slope(y, x) * regr_avgx(y, x) |
Alias(es) |
- |
Description |
The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. |
Formula |
- |
Alias(es) |
- |
Description |
Returns the slope of the linear regression line, where x is the independent variable and y is the dependent variable. |
Formula |
regr_sxy(y, x) / regr_sxx(y, x) |
Alias(es) |
- |
Description |
The population variance, which includes Bessel's bias correction, of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
Formula |
- |
Alias(es) |
- |
Description |
The population covariance, which includes Bessel's bias correction. |
Formula |
- |
Alias(es) |
- |
Description |
The population variance, which includes Bessel's bias correction, of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable. |
Formula |
- |
Alias(es) |
- |
Description |
The skewness. |
Formula |
- |
Alias(es) |
- |
Description |
The population standard deviation. |
Formula |
sqrt(var_pop(x)) |
Alias(es) |
- |
Description |
The sample standard deviation. |
Formula |
sqrt(var_samp(x)) |
Alias(es) |
stddev(x) |
Description |
The population variance, which does not include bias correction. |
Formula |
(sum(x^2) - sum(x)^2 / count(x)) / count(x) , var_samp(y, x) * (1 - 1 / count(x)) |
Alias(es) |
- |
Description |
The sample variance, which includes Bessel's bias correction. |
Formula |
(sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1) , var_pop(y, x) / (1 - 1 / count(x)) |
Alias(es) |
variance(arg, val) |
The table below shows the available “ordered set” aggregate functions.
These functions are specified using the WITHIN GROUP (ORDER BY sort_expression)
syntax,
and they are converted to an equivalent aggregate function that takes the ordering expression
as the first argument.
Function |
Equivalent |
mode() WITHIN GROUP (ORDER BY column [(ASC|DESC)]) |
mode(column ORDER BY column [(ASC|DESC)]) |
percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) |
quantile_cont(column, fraction ORDER BY column [(ASC|DESC)]) |
percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) |
quantile_cont(column, fractions ORDER BY column [(ASC|DESC)]) |
percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) |
quantile_disc(column, fraction ORDER BY column [(ASC|DESC)]) |
percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) |
quantile_disc(column, fractions ORDER BY column [(ASC|DESC)]) |
Function |
Description |
Alias |
grouping() |
For queries with GROUP BY and either ROLLUP or GROUPING SETS : Returns an integer identifying which of the argument expressions where used to group on to create the current supper-aggregate row. |
grouping_id() |