The `FILTER`

clause may optionally follow an aggregate function in a `SELECT`

statement. This will filter the rows of data that are fed into the aggregate function in the same way that a `WHERE`

clause filters rows, but localized to the specific aggregate function. `FILTER`

s are not currently able to be used when the aggregate function is in a windowing context.

There are multiple types of situations where this is useful, including when evaluating multiple aggregates with different filters, and when creating a pivoted view of a dataset. `FILTER`

provides a cleaner syntax for pivoting data when compared with the more traditional `CASE WHEN`

approach discussed below.

Some aggregate functions also do not filter out null values, so using a `FILTER`

clause will return valid results when at times the `CASE WHEN`

approach will not. This occurs with the functions `FIRST`

and `LAST`

, which are desirable in a non-aggregating pivot operation where the goal is to simply re-orient the data into columns rather than re-aggregate it. `FILTER`

also improves null handling when using the `LIST`

and `ARRAY_AGG`

functions, as the `CASE WHEN`

approach will include null values in the list result, while the `FILTER`

clause will remove them.

## Examples

```
-- Compare total row count to:
-- The number of rows where i <= 5
-- The number of rows where i is odd
SELECT
count(*) as total_rows,
count(*) FILTER (WHERE i <= 5) as lte_five,
count(*) FILTER (WHERE i % 2 = 1) as odds
FROM generate_series(1,10) tbl(i);
```

total_rows | lte_five | odds |
---|---|---|

10 | 5 | 5 |

```
-- Different aggregate functions may be used, and multiple WHERE expressions are also permitted
-- The sum of i for rows where i <= 5
-- The median of i where i is odd
SELECT
sum(i) FILTER (WHERE i <= 5) as lte_five_sum,
median(i) FILTER (WHERE i % 2 = 1) as odds_median,
median(i) FILTER (WHERE i % 2 = 1 AND i <= 5) as odds_lte_five_median
FROM generate_series(1,10) tbl(i);
```

lte_five_sum | odds_median | odds_lte_five_median |
---|---|---|

15 | 5.0 | 3.0 |

The `FILTER`

clause can also be used to pivot data from rows into columns. This is a static pivot, as columns must be defined prior to runtime in SQL. However, this kind of statement can be dynamically generated in a host programming language to leverage DuckDB’s SQL engine for rapid, larger than memory pivoting.

```
--First generate an example dataset
CREATE TEMP TABLE stacked_data as
SELECT
i,
CASE WHEN i <= rows * 0.25 THEN 2022
WHEN i <= rows * 0.5 THEN 2023
WHEN i <= rows * 0.75 THEN 2024
WHEN i <= rows * 0.875 THEN 2025
ELSE NULL
END as year
FROM (
SELECT
i,
count(*) over () as rows
FROM generate_series(1,100000000) tbl(i)
) tbl;
--"Pivot" the data out by year (move each year out to a separate column)
SELECT
count(i) FILTER (WHERE year = 2022) as "2022",
count(i) FILTER (WHERE year = 2023) as "2023",
count(i) FILTER (WHERE year = 2024) as "2024",
count(i) FILTER (WHERE year = 2025) as "2025",
count(i) FILTER (WHERE year IS NULL) as "NULLs"
FROM stacked_data;
--This syntax produces the same results as the the FILTER clauses above
SELECT
count(CASE WHEN year = 2022 THEN i END) as "2022",
count(CASE WHEN year = 2023 THEN i END) as "2023",
count(CASE WHEN year = 2024 THEN i END) as "2024",
count(CASE WHEN year = 2025 THEN i END) as "2025",
count(CASE WHEN year IS NULL THEN i END) as "NULLs"
FROM stacked_data;
```

2022 | 2023 | 2024 | 2025 | NULLs |
---|---|---|---|---|

25000000 | 25000000 | 25000000 | 12500000 | 12500000 |

However, the `CASE WHEN`

approach will not work as expected when using an aggregate function that does not ignore `NULL`

values. The `FIRST`

function falls into this category, so `FILTER`

is preferred in this case.

```
--"Pivot" the data out by year (move each year out to a separate column)
SELECT
first(i) FILTER (WHERE year = 2022) as "2022",
first(i) FILTER (WHERE year = 2023) as "2023",
first(i) FILTER (WHERE year = 2024) as "2024",
first(i) FILTER (WHERE year = 2025) as "2025",
first(i) FILTER (WHERE year IS NULL) as "NULLs"
FROM stacked_data;
```

2022 | 2023 | 2024 | 2025 | NULLs |
---|---|---|---|---|

1474561 | 25804801 | 50749441 | 76431361 | 87500001 |

```
--This will produce NULL values whenever the first evaluation of the CASE WHEN clause returns a NULL
SELECT
first(CASE WHEN year = 2022 THEN i END) as "2022",
first(CASE WHEN year = 2023 THEN i END) as "2023",
first(CASE WHEN year = 2024 THEN i END) as "2024",
first(CASE WHEN year = 2025 THEN i END) as "2025",
first(CASE WHEN year IS NULL THEN i END) as "NULLs"
FROM stacked_data;
```

2022 | 2023 | 2024 | 2025 | NULLs |
---|---|---|---|---|

1228801 | NULL | NULL | NULL | NULL |