Search Shortcut cmd + k | ctrl + k
Search cmd+k ctrl+k
0.10 (stable)
PIVOT Statement

The PIVOT statement allows distinct values within a column to be separated into their own columns. The values within those new columns are calculated using an aggregate function on the subset of rows that match each distinct value.

DuckDB implements both the SQL Standard PIVOT syntax and a simplified PIVOT syntax that automatically detects the columns to create while pivoting. PIVOT_WIDER may also be used in place of the PIVOT keyword.

The UNPIVOT statement is the inverse of the PIVOT statement.

Simplified PIVOT Syntax

The full syntax diagram is below, but the simplified PIVOT syntax can be summarized using spreadsheet pivot table naming conventions as:

PIVOT dataset
ON columns
USING values
GROUP BY rows
ORDER BY columns_with_order_directions
LIMIT number_of_rows;

The ON, USING, and GROUP BY clauses are each optional, but they may not all be omitted.

Example Data

All examples use the dataset produced by the queries below:

CREATE TABLE Cities (Country VARCHAR, Name VARCHAR, Year INTEGER, Population INTEGER);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2000, 1005);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2010, 1065);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2020, 1158);
INSERT INTO Cities VALUES ('US', 'Seattle', 2000, 564);
INSERT INTO Cities VALUES ('US', 'Seattle', 2010, 608);
INSERT INTO Cities VALUES ('US', 'Seattle', 2020, 738);
INSERT INTO Cities VALUES ('US', 'New York City', 2000, 8015);
INSERT INTO Cities VALUES ('US', 'New York City', 2010, 8175);
INSERT INTO Cities VALUES ('US', 'New York City', 2020, 8772);
FROM Cities;
Country Name Year Population
NL Amsterdam 2000 1005
NL Amsterdam 2010 1065
NL Amsterdam 2020 1158
US Seattle 2000 564
US Seattle 2010 608
US Seattle 2020 738
US New York City 2000 8015
US New York City 2010 8175
US New York City 2020 8772

PIVOT ON and USING

Use the PIVOT statement below to create a separate column for each year and calculate the total population in each. The ON clause specifies which column(s) to split into separate columns. It is equivalent to the columns parameter in a spreadsheet pivot table.

The USING clause determines how to aggregate the values that are split into separate columns. This is equivalent to the values parameter in a spreadsheet pivot table. If the USING clause is not included, it defaults to count(*).

PIVOT Cities
ON Year
USING sum(Population);
Country Name 2000 2010 2020
NL Amsterdam 1005 1065 1158
US Seattle 564 608 738
US New York City 8015 8175 8772

In the above example, the sum aggregate is always operating on a single value. If we only want to change the orientation of how the data is displayed without aggregating, use the first aggregate function. In this example, we are pivoting numeric values, but the first function works very well for pivoting out a text column. (This is something that is difficult to do in a spreadsheet pivot table, but easy in DuckDB!)

This query produces a result that is identical to the one above:

PIVOT Cities ON Year USING first(Population);

PIVOT ON, USING, and GROUP BY

By default, the PIVOT statement retains all columns not specified in the ON or USING clauses. To include only certain columns and further aggregate, specify columns in the GROUP BY clause. This is equivalent to the rows parameter of a spreadsheet pivot table.

In the below example, the Name column is no longer included in the output, and the data is aggregated up to the Country level.

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country;
Country 2000 2010 2020
NL 1005 1065 1158
US 8579 8783 9510

IN Filter for ON Clause

To only create a separate column for specific values within a column in the ON clause, use an optional IN expression. Let’s say for example that we wanted to forget about the year 2020 for no particular reason…

PIVOT Cities
ON Year IN (2000, 2010)
USING sum(Population)
GROUP BY Country;
Country 2000 2010
NL 1005 1065
US 8579 8783

Multiple Expressions per Clause

Multiple columns can be specified in the ON and GROUP BY clauses, and multiple aggregate expressions can be included in the USING clause.

Multiple ON Columns and ON Expressions

Multiple columns can be pivoted out into their own columns. DuckDB will find the distinct values in each ON clause column and create one new column for all combinations of those values (a Cartesian product).

In the below example, all combinations of unique countries and unique cities receive their own column. Some combinations may not be present in the underlying data, so those columns are populated with NULL values.

PIVOT Cities
ON Country, Name
USING sum(Population);
Year NL_Amsterdam NL_New York City NL_Seattle US_Amsterdam US_New York City US_Seattle
2000 1005 NULL NULL NULL 8015 564
2010 1065 NULL NULL NULL 8175 608
2020 1158 NULL NULL NULL 8772 738

To pivot only the combinations of values that are present in the underlying data, use an expression in the ON clause. Multiple expressions and/or columns may be provided.

Here, Country and Name are concatenated together and the resulting concatenations each receive their own column. Any arbitrary non-aggregating expression may be used. In this case, concatenating with an underscore is used to imitate the naming convention the PIVOT clause uses when multiple ON columns are provided (like in the prior example).

PIVOT Cities ON Country || '_' || Name USING sum(Population);
Year NL_Amsterdam US_New York City US_Seattle
2000 1005 8015 564
2010 1065 8175 608
2020 1158 8772 738

Multiple USING Expressions

An alias may also be included for each expression in the USING clause. It will be appended to the generated column names after an underscore (_). This makes the column naming convention much cleaner when multiple expressions are included in the USING clause.

In this example, both the sum and max of the Population column are calculated for each year and are split into separate columns.

PIVOT Cities
ON Year
USING sum(Population) AS total, max(Population) AS max
GROUP BY Country;
Country 2000_total 2000_max 2010_total 2010_max 2020_total 2020_max
US 8579 8015 8783 8175 9510 8772
NL 1005 1005 1065 1065 1158 1158

Multiple GROUP BY Columns

Multiple GROUP BY columns may also be provided. Note that column names must be used rather than column positions (1, 2, etc.), and that expressions are not supported in the GROUP BY clause.

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country, Name;
Country Name 2000 2010 2020
NL Amsterdam 1005 1065 1158
US Seattle 564 608 738
US New York City 8015 8175 8772

Using PIVOT within a SELECT Statement

The PIVOT statement may be included within a SELECT statement as a CTE (a Common Table Expression, or WITH clause), or a subquery. This allows for a PIVOT to be used alongside other SQL logic, as well as for multiple PIVOTs to be used in one query.

No SELECT is needed within the CTE, the PIVOT keyword can be thought of as taking its place.

WITH pivot_alias AS (
    PIVOT Cities
    ON Year
    USING sum(Population)
    GROUP BY Country
)
SELECT * FROM pivot_alias;

A PIVOT may be used in a subquery and must be wrapped in parentheses. Note that this behavior is different than the SQL Standard Pivot, as illustrated in subsequent examples.

SELECT *
FROM (
    PIVOT Cities
    ON Year
    USING sum(Population)
    GROUP BY Country
) pivot_alias;

Multiple PIVOT Statements

Each PIVOT can be treated as if it were a SELECT node, so they can be joined together or manipulated in other ways.

For example, if two PIVOT statements share the same GROUP BY expression, they can be joined together using the columns in the GROUP BY clause into a wider pivot.

FROM (PIVOT Cities ON Year USING sum(Population) GROUP BY Country) year_pivot
JOIN (PIVOT Cities ON Name USING sum(Population) GROUP BY Country) name_pivot
USING (Country);
Country 2000 2010 2020 Amsterdam New York City Seattle
NL 1005 1065 1158 3228 NULL NULL
US 8579 8783 9510 NULL 24962 1910

Internals

Pivoting is implemented as a combination of SQL query re-writing and a dedicated PhysicalPivot operator for higher performance. Each PIVOT is implemented as set of aggregations into lists and then the dedicated PhysicalPivot operator converts those lists into column names and values. Additional pre-processing steps are required if the columns to be created when pivoting are detected dynamically (which occurs when the IN clause is not in use).

DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query. In order to automatically detect the columns that should be created as a result of a PIVOT statement, it must be translated into multiple queries. ENUM types are used to find the distinct values that should become columns. Each ENUM is then injected into one of the PIVOT statement’s IN clauses.

After the IN clauses have been populated with ENUMs, the query is re-written again into a set of aggregations into lists.

For example:

PIVOT Cities
ON Year
USING sum(Population);

is initially translated into:

CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (
    SELECT DISTINCT
        Year::VARCHAR
    FROM Cities
    ORDER BY
        Year
    );
PIVOT Cities
ON Year IN __pivot_enum_0_0
USING sum(Population);

and finally translated into:

SELECT Country, Name, list(Year), list(population_sum)
FROM (
    SELECT Country, Name, Year, sum(population) AS population_sum
    FROM Cities
    GROUP BY ALL
)
GROUP BY ALL;

This produces the result:

Country Name list(“YEAR”) list(population_sum)
NL Amsterdam [2000, 2010, 2020] [1005, 1065, 1158]
US Seattle [2000, 2010, 2020] [564, 608, 738]
US New York City [2000, 2010, 2020] [8015, 8175, 8772]

The PhysicalPivot operator converts those lists into column names and values to return this result:

Country Name 2000 2010 2020
NL Amsterdam 1005 1065 1158
US Seattle 564 608 738
US New York City 8015 8175 8772

Simplified PIVOT Full Syntax Diagram

Below is the full syntax diagram of the PIVOT statement.

SQL Standard PIVOT Syntax

The full syntax diagram is below, but the SQL Standard PIVOT syntax can be summarized as:

FROM dataset
PIVOT (
    values
    FOR
        column_1 IN (in_list)
        column_2 IN (in_list)
        ...
    GROUP BY rows
);

Unlike the simplified syntax, the IN clause must be specified for each column to be pivoted. If you are interested in dynamic pivoting, the simplified syntax is recommended.

Note that no commas separate the expressions in the FOR clause, but that value and GROUP BY expressions must be comma-separated!

Examples

This example uses a single value expression, a single column expression, and a single row expression:

FROM Cities
PIVOT (
    sum(Population)
    FOR
        Year IN (2000, 2010, 2020)
    GROUP BY Country
);
Country 2000 2010 2020
NL 1005 1065 1158
US 8579 8783 9510

This example is somewhat contrived, but serves as an example of using multiple value expressions and multiple columns in the FOR clause.

FROM Cities
PIVOT (
    sum(Population) AS total,
    count(Population) AS count
    FOR
        Year IN (2000, 2010)
        Country in ('NL', 'US')
);
Name 2000_NL_total 2000_NL_count 2000_US_total 2000_US_count 2010_NL_total 2010_NL_count 2010_US_total 2010_US_count
Amsterdam 1005 1 NULL 0 1065 1 NULL 0
Seattle NULL 0 564 1 NULL 0 608 1
New York City NULL 0 8015 1 NULL 0 8175 1

SQL Standard PIVOT Full Syntax Diagram

Below is the full syntax diagram of the SQL Standard version of the PIVOT statement.

About this page

Last modified: 2024-04-24