USING KEY in Recursive CTEs

Björn Bamberg and Torsten Grust
Published on 2025-05-23
Reading time: 17 minutes

TL;DR: Recursive CTEs in SQL allow for powerful iterative queries like graph traversals but can be memory-intensive and slow due to repeated row accumulation. DuckDB’s new USING KEY feature addresses this by treating intermediate results as keyed dictionaries instead of an ever growing set: existing entries can be updated. This can lead to dramatically better performance and memory efficiency, especially in graph algorithms like shortest path and distance-vector routing. It also simplifies query logic by providing direct access to the entire dictionary.

Assembling SQL Queries from Pieces: CTEs

As SQL queries become more complex, managing their readability, modularity and reusability increasingly becomes a challenge. Common Table Expressions (CTEs) were introduced to address these issues by allowing developers to define temporary, named result sets within a query. Similar to functions in programming, CTEs allow a large query to be broken down into logical building blocks, making it easier to understand, maintain and debug.

CTEs are particularly useful for structuring multi-step transformations that might otherwise require deeply nested subqueries or complex joins. By improving both the clarity and structure of SQL code, CTEs have become an essential tool in modern query writing – enabling the clear, declarative expression of even the most sophisticated logic.

Iterate Like It's 1999: Recursive CTEs

To enhance the expressive power of SQL, recursive CTEs were introduced in the SQL:1999 standard. These allow a query to reference the results from previous iterations within the same expression, enabling SQL to solve more complex problems such as graph traversal and other iterative computations.

This capability pushes SQL beyond basic data retrieval, allowing for the formulation of complex, iterative logic directly in SQL. In fact, recursive CTEs make SQL Turing-complete, meaning it can theoretically express any computation (given sufficient time and memory).

_But how do recursive CTEs work in DuckDB?_

Let’s look at a simple example to break down the mechanism. Suppose we want to calculate the largest power of 2 that is smaller than 100. We can use a recursive CTE power to generate powers of 2 iteratively until we reach that limit. For any row (a, b, c) in power we will have that a^b = c:

WITH RECURSIVE power(a, b, c) AS (
    SELECT 2, 0, 1       -- 2^0 = 1
        UNION
    SELECT a, b+1, a * c -- a^(b+1) = a * a^b
    FROM power           -- reads the working table (contains a single row)
    WHERE a * c < 100
)
FROM power;         -- reads the union table (contains all intermediate results)

We can divide a recursive CTE into two parts, separated by the UNION keyword. The part above the UNION is the non-recursive part (SELECT 2, 0, 1 in our example), and the part below is the recursive part.

In the recursive part, CTE power references itself. This self-reference points to what we call the working table. The working table always holds the rows produced in the immediately preceding iteration. And only those.

Here's how it works step by step:

  • First, the non-recursive part is executed, producing initial rows – in our example, just the row (2,0,1). These rows are stored in the working table.
  • Then, the recursive part is executed using the rows from the working table. Every new row the recursive part produces is stored in the intermediate table, which holds results from the current iteration.
  • If the intermediate table is empty, the iteration ends.
  • Otherwise, we clear the working table and replace it with the contents of the intermediate table – preparing for the next iteration.
  • We additionally add the contents of the intermediate table to the union table, which accumulates all intermediate results across iterations.

Here you can see the entries of the three involved tables in each iteration of the CTE:

Iteration Output of Recursive Step Working Table Intermediate Table Union Table
0 SELECT 2, 0, 1 ∅ (no rows) (2,0,1) (2,0,1)
1 2 * 1 = 2 (2,0,1) (2,1,2) (2,0,1)
(2,1,2)
2 2 * 2 = 4 (2,1,2) (2,2,4) (2,0,1)
(2,1,2)
(2,2,4)
3 2 * 4 = 8 (2,2,4) (2,3,8) (2,0,1)

(2,3,8)
4 2 * 8 = 16 (2,3,8) (2,4,16) (2,0,1)

(2,4,16)
5 2 * 16 = 32 (2,4,16) (2,5,32) (2,0,1)

(2,5,32)
6 2 * 32 = 64 (2,5,32) (2,6,64) (2,0,1)

(2,6,64)
7 2 * 64 = 128 (2,6,64) stop! (2,0,1)

(2,6,64)

When the recursive CTE completes, the union table holds the entire result set, including all intermediate rows from every iteration:

┌───────┬───────┬───────┐
│   a   │   b   │   c   │ -- a^b = c
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     2 │     0 │     1 │
│     2 │     1 │     2 │
│     2 │     2 │     4 │
│     2 │     3 │     8 │
│     2 │     4 │    16 │
│     2 │     5 │    32 │
│     2 │     6 │    64 │
└───────┴───────┴───────┘

The union table provides a complete record of the history of the power computation. This can lead to unnecessary overhead, especially if all we need is the last row – the final result of the recursion. Storing every intermediate value can be inefficient, in particular when the intermediate results aren't required or when working with large datasets (that involve many rows or wide rows, e.g., in the presence of array-typed columns).

Recursive CTEs Suffer from Amnesia

While the union table holds the just mentioned history once the computation is complete, recursive CTEs suffer from a case of "amnesia" while the iteration is going on: the recursive part only ever sees the intermediate results of the immediately preceding iteration. This can be limiting and many of us will have seen how query authors work around this limitation in terms of manually maintained arrays (or similar container structures) that hold information about previous iterations. This may be costly. Then again, enabling the recursive part to access the union table with all its accumulated – potentially sizable – intermediate results may easily incur performance problems while we iterate. A real conundrum.

USING a KEY to Cure Amnesia

Can we afford to allow the recursive part to access the union table? Yes, but we need a means to control its size. Operating the union table in append-only mode thus is a no-go. Instead, let the recursive part optionally overwrite existing rows in the union table with fresh information that has been computed in the current iteration. This can reduce the union table's size significantly (see our experiments below).

Starting with version 1.3, DuckDB features a USING KEY variant of recursive CTEs that incorporates just this idea.

If you would like to read more about its origins, check out CIDR 2023. For further information about implementation, see SIGMOD 2025 (to be published).

The variant introduces two major differences compared to traditional recursive CTEs:

  1. It provides access to the working table (as always) as well as access to the union table – which we now call the recurring table.
  2. Instead of simply appending new rows to the recurring table, the table now functions more like a dictionary (similar to a Python dict), allowing key-based updates.

To use this new feature, add the USING KEY (...) clause to your recursive CTE:

WITH RECURSIVE power(a, b, c) USING KEY (a)
    -- key: a, payload: b, c
AS (
    ...
)

With USING KEY, the schema of the recursive CTE is divided into key columns and payload columns. The key columns are specified using the USING KEY (column names) clause, while the remaining columns are treated as payload.

This division affects how the recurring table behaves. Instead of stubbornly appending new rows on each iteration, it acts more like a dictionary: if the recursive part returns a row that hasn't been seen before, it is added to the recurring table as usual. But if a row shares a key with an existing entry in the recurring table, the payload is updated – replacing the previous values for that key in the recurring table.

If multiple rows with the same key are produced in a single iteration, only the last one is retained. Therefore, you may wish to use the ORDER BY clause in the recursive part to control which row is kept.

This approach allows recursive queries to maintain and update state more efficiently, especially for algorithms in which keeping the latest (or “best”) value for a given key is crucial.

Overwriting Old Intermediate Results

Let us pick up our recursive example query from above. Now we compute the powers of bases 2 and 3, as long as these are less than 100:

WITH RECURSIVE power(a,b,c) USING KEY (a) AS (
    FROM (VALUES (2,0,1), (3,0,1))   -- 2^0 = 1, 3^0 = 1
        UNION
    SELECT a, b+1, a * c             -- a^(b+1) = a * a^b
    FROM power                       -- reads the working table (contains two rows)
    WHERE a * c < 100
)
FROM power;                          -- reads the recurring table (contains two rows)

We start with two rows in the non-recursive part: one with a key (base) of 2 and another with a key of 3. In the recursive part, we multiply the intermediate power by the base. This produces two rows, again with the keys 2 and 3.

Unlike traditional recursive CTEs, we do not append these two new rows to the recurring table. Instead, we update the existing rows with keys 2 and 3 in the recurring table, overwriting their payload values. We thus keep only two rows throughout the entire computation, each holding the current power value for its key (or base) in column a.

Iteration Output of Recursive Step Working Table Intermediate Table Recurring Table
0 SELECT 2, 0, 1
SELECT 3, 0, 1
∅ (no rows) (2,0,1)
(3,0,1)
(2,0,1)
(3,0,1)
1 2 * 1 = 2
3 * 1 = 3
(2,0,1)
(3,0,1)
(2,1,2)
(3,1,3)
(2,1,2)
(3,1,3)
2 2 * 2 = 4
3 * 3 = 9
(2,1,2)
(3,1,3)
(2,2,4)
(3,2,9)
(2,2,4)
(3,2,9)
3 2 * 4 = 8
3 * 9 = 27
(2,2,4)
(3,2,9)
(2,3,8)
(3,3,27)
(2,3,8)
(3,3,27)
4 2 * 8 = 16
3 * 27 = 81
(2,3,8)
(3,3,27)
(2,4,16)
(3,4,81)
(2,4,16)
(3,4,81)
5 2 * 16 = 32
3 * 81 = 243
(2,4,16)
(3,4,81)
(2,5,32) (2,5,32)
(3,4,81)
6 2 * 32 = 64 (2,5,32) (2,6,64) (2,6,64)
(3,4,81)
7 2 * 64 = 128 (2,6,64) stop! (2,6,64)
(3,4,81)

As we can see, the size of the recurring table remains constant. Irrelevant computation history is overwritten which leads to reduced memory usage. The final recurring table reads:

┌───────┬───────┬───────┐
│   a   │   b   │   c   │ -- a^b = c
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     2 │     6 │    64 │
│     3 │     4 │    81 │
└───────┴───────┴───────┘

This behavior is especially useful with algorithms in which we are interested in the latest, best, or smallest value for a given key. The maximum number of rows in the recurring table is now bounded by the number of unique keys. Since the number of distinct keys in use is under control of the recursive part, this can be a powerful advantage when working with sizable datasets.

A Change of Key

Now, should you happen to be interested in the history of the computation and are willing to invest the memory space, all that is required is a change of key. With

WITH RECURSIVE power(a, b, c)
    USING KEY (a,b)
AS (  -- formerly: USING KEY (a)
    ...
)
FROM power
ORDER BY a, b;

the iteration counter (or exponent) in column b is considered to be part of the key, too. The recurring table will thus track unique (a,b) combinations (i.e., base, exponent) and we are able to trace what was going on during iteration:

┌───────┬───────┬───────┐
│   a   │   b   │   c   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     2 │     0 │     1 │
│     2 │     1 │     2 │
│     2 │     2 │     4 │
│     2 │     3 │     8 │
│     2 │     4 │    16 │
│     2 │     5 │    32 │
│     2 │     6 │    64 │
│     3 │     0 │     1 │
│     3 │     1 │     3 │
│     3 │     2 │     9 │
│     3 │     3 │    27 │
│     3 │     4 │    81 │
└───────┴───────┴───────┘

Accessing Relevant History

Another major difference to vanilla recursive CTEs: now that the size of the recurring table is under control, we can afford to reference it directly in the recursive part of the CTE. This allows us to access any intermediate result that has not been overwritten yet – regardless of which iteration computed these results. No amnesia anymore! To access the recurring table, simply prefix the CTE name with the pseudo-schema name recurring:

WITH RECURSIVE t(...) USING KEY (...) AS (
    ...
    FROM recurring.t  -- reads the recurring table while we iterate
)
...

USING KEY Can Unlock Performance Advantages

To further highlight the differences between vanilla and key-based CTEs, let's consider a more complex example using a graph dataset – a social network graph.

The graphs were derived from the LDBC Social Network Benchmark (SNB), which provides a generator for synthetic social network data. To make the graphs more manageable for use with the WITH RECURSIVE query, we further filtered them by person name, reducing their density.

In this dataset, nodes represent people while edges correspond to relationships between them. The tables we’re working with are Person(id), which contains all existing ids in the network and knows(person1id, person2id) in which each row holds a pair of people knowing each other.

If you would like to try this out for yourself, begin by attaching the database to any DuckDB session.

ATTACH 'https://blobs.duckdb.org/data/using-key-graph.duckdb';
USE 'using-key-graph';

Our goal is to compute the shortest path between all pairs of people in the social network. Since this is a problem with inherent quadratic complexity, we need to closely eye runtime and memory requirements. For each pair, we start by adding a row that has one person as the start node and the other as the target node. We then iteratively explore all individuals known by the start node, continuing the traversal until the target person is reached. Paths through the network are encoded by via nodes: to reach the target from the start node, first proceed to the via node (an immediate neighbor of the start node) – once there, use that node's via entry to continue your traversal.

A recursive CTE that implements this approach reads as follows:

WITH RECURSIVE paths(here, current, via, len, there, completed, found) AS (
  SELECT n1.id AS here, n1.id AS current, NULL::INT64 AS via,
         0 AS len, n2.id AS there, false AS completed, false AS found
    FROM Person AS n1 JOIN Person AS n2 ON (n2.id <> n1.id)
    UNION ALL
  SELECT paths.here,
         person2id AS current,
         coalesce(paths.via, knows.person2id) AS via,
         paths.len+1 AS len,
         paths.there,
         bool_or(knows.person2id = paths.there)
                 OVER (PARTITION BY paths.here, paths.there
                       ROWS BETWEEN UNBOUNDED PRECEDING
                                AND UNBOUNDED FOLLOWING) AS completed,
         knows.person2id = paths.there AS found
      FROM paths
      JOIN knows ON (paths.current = knows.person1id AND NOT paths.completed)
 )
 SELECT here, there, via, len
 FROM paths WHERE found
 ORDER BY len, here, there;

The working table of recursive CTE paths has seven columns:

  • here refers to the person where the traversal starts,
  • there refers to the target person we aim to reach,
  • via indicates the immediate neighbour of the start node of the current path,
  • len is the length of the current path, which is incremented by one for each step taken in the traversal.

The remaining three columns control the traversal logic and optimize the computation.

  • current refers to the node that is currently being explored during the traversal,
  • found indicates whether the current path has successfully reached the target person,
  • completed tracks whether any path with the same here and there has already reached the target. This prevents further traversal for that pair once a shortest path has been found, thus avoiding the exploration of longer paths.

If we search in a large graph with many edges, the union table in a vanilla recursive CTE can grow large, potentially exceeding memory limits. This not only causes significant performance issues but can even lead to query crashes in extreme cases.

The new key-based CTEs can avoid this problem by changing how the state of the search is maintained. This enables a range of new algorithms to be expressed efficiently in SQL, including algorithms for finding the shortest paths in large graphs.

One such algorithm is Distance Vector Routing (DVR), a method for computing the shortest paths in a network based on node-local routing table that indicate where to “hop next”.

  • In DVR, each node maintains a routing table that records the length of the paths that reach other nodes.
  • We use the recurring table to store these routing tables for all nodes in the network: a row (here, there, via, len) indicates that the first hop on the path from node here to node there is node via. The length of the overall path is len.
  • When a shorter path to a target node is found, the corresponding routing table entry is updated.
  • Routing updates found in the last iteration are distributed to neighboring nodes through the working table.

To check if a newly incoming routing update improves a currently known path, we perform a lookup in the recurring table. If the new length is smaller than that of the known path, we update the entry and propagate the update to our immediate neighbors. This mechanism allows for efficient path finding even in very large graphs – see below how DVR outperforms the above approach based on a vanilla recursive CTEs.

WITH RECURSIVE dvr(here, there, via, len) USING KEY (here, there) AS (
  -- initialize routing tables for all nodes, only the routes to
  -- immediate neighbors are known at this time
  SELECT n.person1id AS here, n.person2id AS there, n.person2id AS via, 1::DOUBLE AS len
  FROM   knows AS n
    UNION
  (SELECT n.person1id AS here, dvr.there, dvr.here AS via, 1 + dvr.len AS len
    FROM dvr -- working table - routing updates shared by neighbors
    JOIN knows AS n ON
        (n.person2id =  dvr.here AND    -- send update only to immediate neighbors
        n.person1id <> dvr.there)       -- no need to store a route to myself
    LEFT JOIN recurring.dvr AS rec ON   -- recurring table (current routing tables)
        (rec.here = n.person1id AND rec.there = dvr.there) -- identify affected routing table entry
    WHERE 1 + dvr.len < coalesce(rec.len, 'Infinity'::DOUBLE) --  does the routing update improve the entry in the routing table?
    ORDER BY len -- shortest path first
  )
)
FROM dvr
ORDER BY len, here, there;

The larger the social network graphs (A to G), the more pronounced is the performance gap between vanilla recursive CTEs (REC) and the new USING KEY variant (KEY). The table below reports the number of rows processed during each iteration:

graph nodes edges KEY REC
A 184 233 744 352,906
B 322 903 8,232 40,732,577
C 424 1,446 19,213 605,859,791
D 484 2,049 30,871
E 1,119 8,809 255,425
F 1,481 14,256 491,880
G 1,618 16,619 607,926

Even for the smallest graph, Graph A, the difference is significant: the REC CTE produces around 350,000 rows, while KEY generates only 744 rows. As the graph size increases, the gap becomes even more striking. In Graph C, with 424 nodes and 1,446 edges, the REC approach processes nearly 1 billion rows, while the KEY method handles less than 20,000 rows. Although this isn't the largest graph in our benchmark, the REC approach is already approaching out-of-memory conditions (OOM, ❌).

This substantial difference in memory usage is only part of the story. The performance of the REC approach also degrades quickly. While both CTEs perform similarly on small graphs, REC becomes significantly slower as the graph grows – eventually crashing – whereas KEY continues to scale smoothly:

And that’s the beauty of the new USING KEY CTEs. It enables more efficient expression of complex iterative algorithms by providing key-based control over the size of intermediate results that are passed from iteration to iteration – memory pressure lessens, run time performance can go up significantly. If you're working with recursive CTEs in DuckDB, be sure to take advantage of this powerful addition – it may make a significant difference in your queries.