- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java
- Julia
- Node.js (Neo)
- Node.js
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- ADBC
- ODBC
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- LOAD / INSTALL
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Statement
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Samples
- Configuration
- Extensions
- Overview
- Core Extensions
- Community Extensions
- Working with Extensions
- Versioning of Extensions
- Arrow
- AutoComplete
- AWS
- Azure
- Delta
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- Substrait
- TPC-DS
- TPC-H
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Glossary of Terms
- Browse Offline
- Operations Manual
- Overview
- Limits
- Non-Deterministic Behavior
- Embedding DuckDB
- DuckDB's Footprint
- Securing DuckDB
- Development
- DuckDB Repositories
- Testing
- Overview
- sqllogictest Introduction
- Writing Tests
- Debugging
- Result Verification
- Persistent Testing
- Loops
- Multiple Connections
- Catch
- Profiling
- Release Calendar
- Building
- Benchmark Suite
- Internals
- Sitemap
- Why DuckDB
- Media
- FAQ
- Code of Conduct
- Live Demo
Several operators in DuckDB exhibit non-deterministic behavior. Most notably, SQL uses set semantics, which allows results to be returned in a different order. DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution. Other factors, such as using different compilers, operating systems, and hardware architectures, can also cause changes in ordering. This page documents the cases where non-determinism is an expected behavior. If you would like to make your queries determinisic, see the “Working Around Non-Determinism” section.
Set Semantics
One of the most common sources of non-determinism is the set semantics used by SQL. E.g., if you run the following query repeatedly, you may get two different results:
SELECT *
FROM (
SELECT 'A' AS x
UNION
SELECT 'B' AS x
);
Both results A
, B
and B
, A
are correct.
Different Results on Different Platforms: array_distinct
The array_distinct
function may return results in a different order on different platforms:
SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr;
For this query, both [A, B]
and [B, A]
are valid results.
Floating-Point Aggregate Operations with Multi-Threading
Floating-point inaccuracies may produce different results when run in a multi-threaded configurations:
For example, stddev
and corr
may produce non-deterministic results:
CREATE TABLE tbl AS
SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y
FROM range(1, 1_000_000) r(i);
SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation FROM tbl
GROUP BY s
ORDER BY s;
The expected standard deviations and correlations from this query are 0 for all values of s
.
However, when executed on multiple threads, the query may return small numbers (0 <= z < 10e-16
) due to floating-point inaccuracies.
Working Around Non-Determinism
For the majority of use cases, non-determinism is not causing any issues. However, there are some cases where deterministic results are desirable. In these cases, try the following workarounds:
-
Limit the number of threads to prevent non-determinism introduced by multi-threading.
SET threads = 1;
-
Enforce ordering. For example, you can use the
ORDER BY ALL
clause:SELECT * FROM ( SELECT 'A' AS x UNION SELECT 'B' AS x ) ORDER BY ALL;
You can also sort lists using
list_sort
:SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i ORDER BY i;
It's also possible to introduce a deterministic shuffling.