⌘+k ctrl+k
1.1.3 (stable)
Search Shortcut cmd + k | ctrl + k
Non-Deterministic Behavior

Several operators in DuckDB exhibit non-deterministic behavior. Most notably, SQL uses set semantics, which allows results to be returned in a different order. DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution. Other factors, such as using different compilers, operating systems, and hardware architectures, can also cause changes in ordering. This page documents the cases where non-determinism is an expected behavior. If you would like to make your queries determinisic, see the “Working Around Non-Determinism” section.

Set Semantics

One of the most common sources of non-determinism is the set semantics used by SQL. E.g., if you run the following query repeatedly, you may get two different results:

SELECT *
FROM (
    SELECT 'A' AS x
    UNION
    SELECT 'B' AS x
);

Both results A, B and B, A are correct.

Different Results on Different Platforms: array_distinct

The array_distinct function may return results in a different order on different platforms:

SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr;

For this query, both [A, B] and [B, A] are valid results.

Floating-Point Aggregate Operations with Multi-Threading

Floating-point inaccuracies may produce different results when run in a multi-threaded configurations: For example, stddev and corr may produce non-deterministic results:

CREATE TABLE tbl AS
    SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y
    FROM range(1, 1_000_000) r(i);

SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation FROM tbl
GROUP BY s
ORDER BY s;

The expected standard deviations and correlations from this query are 0 for all values of s. However, when executed on multiple threads, the query may return small numbers (0 <= z < 10e-16) due to floating-point inaccuracies.

Working Around Non-Determinism

For the majority of use cases, non-determinism is not causing any issues. However, there are some cases where deterministic results are desirable. In these cases, try the following workarounds:

  1. Limit the number of threads to prevent non-determinism introduced by multi-threading.

    SET threads = 1;
    
  2. Enforce ordering. For example, you can use the ORDER BY ALL clause:

    SELECT *
    FROM (
        SELECT 'A' AS x
        UNION
        SELECT 'B' AS x
    )
    ORDER BY ALL;
    

    You can also sort lists using list_sort:

    SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i
    ORDER BY i;
    

    It's also possible to introduce a deterministic shuffling.