⌘+k ctrl+k
1.1.3 (stable)
Search Shortcut cmd + k | ctrl + k
FILTER Clause

The FILTER clause may optionally follow an aggregate function in a SELECT statement. This will filter the rows of data that are fed into the aggregate function in the same way that a WHERE clause filters rows, but localized to the specific aggregate function. FILTERs are not currently able to be used when the aggregate function is in a windowing context.

There are multiple types of situations where this is useful, including when evaluating multiple aggregates with different filters, and when creating a pivoted view of a dataset. FILTER provides a cleaner syntax for pivoting data when compared with the more traditional CASE WHEN approach discussed below.

Some aggregate functions also do not filter out NULL values, so using a FILTER clause will return valid results when at times the CASE WHEN approach will not. This occurs with the functions first and last, which are desirable in a non-aggregating pivot operation where the goal is to simply re-orient the data into columns rather than re-aggregate it. FILTER also improves NULL handling when using the list and array_agg functions, as the CASE WHEN approach will include NULL values in the list result, while the FILTER clause will remove them.

Examples

Return the following:

  • The total number of rows.
  • The number of rows where i <= 5
  • The number of rows where i is odd
SELECT
    count(*) AS total_rows,
    count(*) FILTER (i <= 5) AS lte_five,
    count(*) FILTER (i % 2 = 1) AS odds
FROM generate_series(1, 10) tbl(i);
total_rows lte_five odds
10 5 5

Different aggregate functions may be used, and multiple WHERE expressions are also permitted:

SELECT
    sum(i) FILTER (i <= 5) AS lte_five_sum,
    median(i) FILTER (i % 2 = 1) AS odds_median,
    median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median
FROM generate_series(1, 10) tbl(i);
lte_five_sum odds_median odds_lte_five_median
15 5.0 3.0

The FILTER clause can also be used to pivot data from rows into columns. This is a static pivot, as columns must be defined prior to runtime in SQL. However, this kind of statement can be dynamically generated in a host programming language to leverage DuckDB's SQL engine for rapid, larger than memory pivoting.

First generate an example dataset:

CREATE TEMP TABLE stacked_data AS
    SELECT
        i,
        CASE WHEN i <= rows * 0.25  THEN 2022
             WHEN i <= rows * 0.5   THEN 2023
             WHEN i <= rows * 0.75  THEN 2024
             WHEN i <= rows * 0.875 THEN 2025
             ELSE NULL
             END AS year
    FROM (
        SELECT
            i,
            count(*) OVER () AS rows
        FROM generate_series(1, 100_000_000) tbl(i)
    ) tbl;

“Pivot” the data out by year (move each year out to a separate column):

SELECT
    count(i) FILTER (year = 2022) AS "2022",
    count(i) FILTER (year = 2023) AS "2023",
    count(i) FILTER (year = 2024) AS "2024",
    count(i) FILTER (year = 2025) AS "2025",
    count(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;

This syntax produces the same results as the FILTER clauses above:

SELECT
    count(CASE WHEN year = 2022 THEN i END) AS "2022",
    count(CASE WHEN year = 2023 THEN i END) AS "2023",
    count(CASE WHEN year = 2024 THEN i END) AS "2024",
    count(CASE WHEN year = 2025 THEN i END) AS "2025",
    count(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 NULLs
25000000 25000000 25000000 12500000 12500000

However, the CASE WHEN approach will not work as expected when using an aggregate function that does not ignore NULL values. The first function falls into this category, so FILTER is preferred in this case.

“Pivot” the data out by year (move each year out to a separate column):

SELECT
    first(i) FILTER (year = 2022) AS "2022",
    first(i) FILTER (year = 2023) AS "2023",
    first(i) FILTER (year = 2024) AS "2024",
    first(i) FILTER (year = 2025) AS "2025",
    first(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 NULLs
1474561 25804801 50749441 76431361 87500001

This will produce NULL values whenever the first evaluation of the CASE WHEN clause returns a NULL:

SELECT
    first(CASE WHEN year = 2022 THEN i END) AS "2022",
    first(CASE WHEN year = 2023 THEN i END) AS "2023",
    first(CASE WHEN year = 2024 THEN i END) AS "2024",
    first(CASE WHEN year = 2025 THEN i END) AS "2025",
    first(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 NULLs
1228801 NULL NULL NULL NULL

Aggregate Function Syntax (Including FILTER Clause)