Search Shortcut cmd + k | ctrl + k
Search cmd+k ctrl+k
1.0 (stable)
GROUPING SETS

`GROUPING SETS`, `ROLLUP` and `CUBE` can be used in the `GROUP BY` clause to perform a grouping over multiple dimensions within the same query. Note that this syntax is not compatible with `GROUP BY ALL`.

## Examples

Compute the average income along the provided four different dimensions:

``````-- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate)
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ());
``````

Compute the average income along the same dimensions:

``````SELECT city, street_name, avg(income)
FROM addresses
GROUP BY CUBE (city, street_name);
``````

Compute the average income along the dimensions `(city, street_name)`, `(city)` and `()`:

``````SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ROLLUP (city, street_name);
``````

## Description

`GROUPING SETS` perform the same aggregate across different `GROUP BY clauses` in a single query.

``````CREATE TABLE students (course VARCHAR, type VARCHAR);
INSERT INTO students (course, type)
VALUES
('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'),
('CS', NULL), ('CS', NULL), ('Math', NULL);
``````
``````SELECT course, type, count(*)
FROM students
GROUP BY GROUPING SETS ((course, type), course, type, ());
``````
course type count_star()
Math NULL 1
NULL NULL 7
CS PhD 1
CS Bachelor 2
Math Masters 1
CS NULL 2
Math NULL 2
CS NULL 5
NULL NULL 3
NULL Masters 1
NULL Bachelor 2
NULL PhD 1

In the above query, we group across four different sets: `course, type`, `course`, `type` and `()` (the empty group). The result contains `NULL` for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following UNION statement:

Group by course, type:

``````SELECT course, type, count(*)
FROM students
GROUP BY course, type
UNION ALL
``````

Group by type:

``````SELECT NULL AS course, type, count(*)
FROM students
GROUP BY type
UNION ALL
``````

Group by course:

``````SELECT course, NULL AS type, count(*)
FROM students
GROUP BY course
UNION ALL
``````

Group by nothing:

``````SELECT NULL AS course, NULL AS type, count(*)
FROM students;
``````

`CUBE` and `ROLLUP` are syntactic sugar to easily produce commonly used grouping sets.

The `ROLLUP` clause will produce all "sub-groups" of a grouping set, e.g., `ROLLUP (country, city, zip)` produces the grouping sets `(country, city, zip), (country, city), (country), ()`. This can be useful for producing different levels of detail of a group by clause. This produces `n+1` grouping sets where n is the amount of terms in the `ROLLUP` clause.

`CUBE` produces grouping sets for all combinations of the inputs, e.g., `CUBE (country, city, zip)` will produce `(country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), ()`. This produces `2^n` grouping sets.

## Identifying Grouping Sets with `GROUPING_ID()`

The super-aggregate rows generated by `GROUPING SETS`, `ROLLUP` and `CUBE` can often be identified by `NULL`-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual `NULL`-values, then it can be challenging to distinguish whether the value in the resultset is a "real" `NULL`-value coming out of the data itself, or a `NULL`-value generated by the grouping construct. The `GROUPING_ID()` or `GROUPING()` function is designed to identify which groups generated the super-aggregate rows in the result.

`GROUPING_ID()` is an aggregate function that takes the column expressions that make up the grouping(s). It returns a `BIGINT` value. The return value is `0` for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query:

``````WITH days AS (
SELECT
year("generate_series")    AS y,
quarter("generate_series") AS q,
month("generate_series")   AS m
FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()"
FROM days
GROUP BY GROUPING SETS (
(y, q, m),
(y, q),
(y),
()
)
ORDER BY y, q, m;
``````

These are the results:

y q m grouping_id()
2023 1 1 0
2023 1 2 0
2023 1 3 0
2023 1 NULL 1
2023 2 4 0
2023 2 5 0
2023 2 6 0
2023 2 NULL 1
2023 3 7 0
2023 3 8 0
2023 3 9 0
2023 3 NULL 1
2023 4 10 0
2023 4 11 0
2023 4 12 0
2023 4 NULL 1
2023 NULL NULL 3
NULL NULL NULL 7

In this example, the lowest level of grouping is at the month level, defined by the grouping set `(y, q, m)`. Result rows corresponding to that level are simply aggregate rows and the `GROUPING_ID(y, q, m)` function returns `0` for those. The grouping set `(y, q)` results in super-aggregate rows over the month level, leaving a `NULL`-value for the `m` column, and for which `GROUPING_ID(y, q, m)` returns `1`. The grouping set `(y)` results in super-aggregate rows over the quarter level, leaving `NULL`-values for the `m` and `q` column, for which `GROUPING_ID(y, q, m)` returns `3`. Finally, the `()` grouping set results in one super-aggregate row for the entire resultset, leaving `NULL`-values for `y`, `q` and `m` and for which `GROUPING_ID(y, q, m)` returns `7`.

To understand the relationship between the return value and the grouping set, you can think of `GROUPING_ID(y, q, m)` writing to a bitfield, where the first bit corresponds to the last expression passed to `GROUPING_ID()`, the second bit to the one-but-last expression passed to `GROUPING_ID()`, and so on. This may become clearer by casting `GROUPING_ID()` to `BIT`:

``````WITH days AS (
SELECT
year("generate_series")    AS y,
quarter("generate_series") AS q,
month("generate_series")   AS m
FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT
y, q, m,
GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)",
right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits"
FROM days
GROUP BY GROUPING SETS (
(y, q, m),
(y, q),
(y),
()
)
ORDER BY y, q, m;
``````

Which returns these results:

y q m grouping_id(y, q, m) y_q_m_bits
2023 1 1 0 000
2023 1 2 0 000
2023 1 3 0 000
2023 1 NULL 1 001
2023 2 4 0 000
2023 2 5 0 000
2023 2 6 0 000
2023 2 NULL 1 001
2023 3 7 0 000
2023 3 8 0 000
2023 3 9 0 000
2023 3 NULL 1 001
2023 4 10 0 000
2023 4 11 0 000
2023 4 12 0 000
2023 4 NULL 1 001
2023 NULL NULL 3 011
NULL NULL NULL 7 111

Note that the number of expressions passed to `GROUPING_ID()`, or the order in which they are passed is independent from the actual group definitions appearing in the `GROUPING SETS`-clause (or the groups implied by `ROLLUP` and `CUBE`). As long as the expressions passed to `GROUPING_ID()` are expressions that appear some where in the `GROUPING SETS`-clause, `GROUPING_ID()` will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate.

## Syntax

##### About this page

Last modified: 2024-07-20