Frequently Asked Questions

Who makes DuckDB?

DuckDB was created by Dr. Mark Raasveldt & Dr. Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, the Netherlands. Mark and Hannes have set up the DuckDB Foundation that collects donations and funds development and maintenance of DuckDB. Mark and Hannes are also co-founders of DuckDB Labs, which provides commercial services around DuckDB, and employs several core contributors of DuckDB.

Why call it DuckDB?

Ducks are amazing animals. They can fly, walk and swim. They can also live off pretty much everything. They are quite resilient to environmental challenges. A duck's song will bring people back from the dead and inspires database research. They are thus the perfect mascot for a versatile and resilient data management system.

Is DuckDB open-source?

DuckDB is fully open-source under the MIT license and its development takes place on GitHub in the duckdb/duckdb repository. All components of DuckDB are available in the free version under this license: there is no “enterprise version” of DuckDB.

Most of the intellectual property of DuckDB has been purposefully moved to a non-profit entity to disconnect the licensing of the project from the commercial company, DuckDB Labs. The DuckDB Foundation's statutes also ensure DuckDB remains open-source under the MIT license in perpetuity. The CWI (Centrum Wiskunde & Informatica) has a seat on the board of the DuckDB Foundation and donations to the DuckDB Foundation directly fund DuckDB development.

For more information on the organizations around DuckDB, see the next question–answer pair.

DuckDB is the name of the MIT licensed open-source project.

The DuckDB Foundation is a non-profit organization that holds the intellectual property of the DuckDB project. The DuckDB Foundation's statutes ensure DuckDB remains open-source under the MIT license in perpetuity.

DuckDB Labs is a company based in Amsterdam that provides commercial support services for DuckDB. DuckDB Labs employs the core contributors of the DuckDB project.

MotherDuck is a venture-backed company creating a hybrid cloud/local platform using DuckDB. MotherDuck contracts with DuckDB Labs for development services, and DuckDB Labs owns a portion of MotherDuck. See the partnership announcement for details. To learn more about MotherDuck, see the CIDR 2024 paper on MotherDuck and the MotherDuck documentation.

Where do I find the DuckDB logo and design guidelines?

Please head to the Design & Brand Assets page.

Where do I find DuckDB trademark use guidelines?

Please consult the trademark guidelines for DuckDB™.

I found a project with “duck” in its name. Is it officially affiliated with DuckDB?

The following projects are officially affiliated with DuckDB:

Other projects are likely not affiliated with the DuckDB project. Please check their websites, READMEs and licenses for more details.

What is the official name of the project?

In official communication, we refer to DuckDB exclusively as “DuckDB” and avoid other names and spellings such as “DDB”, “the Duck” and “DuckDb”. Of course, the alternatives are also widely understood and you are welcome to use them, but using “DuckDB” is preferred.

Can DuckDB save data to disk?

DuckDB supports persistent storage and stores the database as a single file, which includes all tables, views, indexes, macros, etc. present in the database. DuckDB's storage format uses a compressed columnar representation, which is compact but allows for efficient bulk updates. DuckDB can also run in in-memory mode, where no data is persisted to disk. DuckDB can also save data in DuckLake format through the ducklake extension.

What type of storage should I run DuckDB on (e.g., local disks, network-attached storage)?

The type of storage used to run DuckDB has a significant performance impact. In general, using SSDs (SATA or NVMe SSDs) leads to superior performance compared to HDDs.

The location of the storage varies greatly depending the workload:

For read-only workloads, the DuckDB database can be stored on local disks and remote endpoints such as HTTPS and cloud object storage such as AWS S3 and similar providers.
For read-write workloads, storing the database on instance-attached storage yields the best performance. Network-attached cloud storage such as AWS EBS also works and its performance can be fine-tuned with the guaranteed IOPS settings. Based on our experience, we strongly advise against running DuckDB – or any other database management system – for read-write workloads on network-attached storage (NAS). These setups are often slow and result in spurious failures that are difficult to troubleshoot.

Is DuckDB an in-memory database?

It is a common misconception that DuckDB is an in-memory database. While DuckDB can work in-memory, it is not an in-memory database. DuckDB can make use of available memory for caching, it also fully supports disk-based persistence and offloading larger-than-memory operations to disk.

Is DuckDB built on Arrow?

DuckDB does not use the Apache Arrow format internally. However, DuckDB supports reading from and writing to Arrow using the arrow community extension. It can also run SQL queries directly on Arrow using pyarrow.

Are DuckDB's database files portable between different DuckDB versions and clients?

Since version 0.10.0 (released in February 2024), DuckDB is backwards-compatible when reading database files, i.e., newer versions of DuckDB are always able to read database files created with an older version of DuckDB. DuckDB also provides partial forwards-compatibility on a best-effort basis. See the storage page for more details. Compatibility is also guaranteed between different DuckDB clients (e.g., Python and R): a database file created with one client can be read with other clients.

How does DuckDB handle concurrency? Can multiple processes write to DuckDB?

See the documentation on handling concurrency and the section on “Writing to DuckDB from Multiple Processes”.

To work on the same data set with multiple DuckDB clients, consider using the DuckLake format through the ducklake extension.

Is there an official DuckDB Docker image available?

You can run the DuckDB command line client using the official DuckDB Docker image.

Please note that in most cases you do not need a container to run DuckDB: you can simply deploy it in-process within your client application or as a standalone command-line binary.

How to work with multiple DuckDB clients on the same computer?

You can install multiple DuckDB clients on the same computer. These clients are installed individually and can have different DuckDB versions. For example, you can use the DuckDB 1.3.2 package in R, DuckDB 1.4.0 as the CLI client, and the preview release in Python.

If you are unsure about the DuckDB version used in a process, run the PRAGMA version query, which prints the version of DuckDB.

Where can I learn more about DuckDB?

DuckDB has an the official documentation, blog and library. At the same time, there are a few third-party resources which can help you learn more about DuckDB:

To discover projects using DuckDB, we recommend visiting the awesome-duckdb repository.
There is a number of DuckDB books available.
The tldr pages initiative has a DuckDB entry.

Does DuckDB use SIMD?

DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture.

How does scalability work in DuckDB?

DuckDB is a single-node database system, hence it makes use of vertical scalability, i.e., making use of more resources (CPU, memory, and disk) to support larger datasets. DuckDB has been tested on machines with 100+ CPU cores and terabytes of memory.

DuckDB's native database format also scales for multiple terabytes of data but this needs some planning – see the “Working with Huge Databases” page.

For working with large-scale datasets and/or collaborating on the same dataset, consider using the DuckLake lakehouse format.

I would like to benchmark DuckDB against another system. How do I proceed?

We welcome experiments comparing DuckDB's performance to other systems. To ensure fair comparison, we have a few recommendations. First, try to use the preview release, which often has significant performance improvements compared to the last stable release. Second, consider consulting our DBTest 2018 paper Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing for guidelines on how to avoid common issues in benchmarks. Third, study the DuckDB Performance Guide, which has best practices for ensuring optimal performance. Finally, please report the DuckDB version (for stable version, the version number, for nightly builds, the commit hash).

Is DuckDB intended for data science or data engineering workloads?

DuckDB was designed with both data science and data engineering workloads in mind. Therefore, you can use DuckDB's SQL syntax to be highly flexible, or very precise, depending on your needs.

For data science users, who often run queries in an interactive fashion, DuckDB offers several mechanisms for quickly exploring data sets. For example, CSV files can be loaded by auto-inferring their schema using CREATE TABLE tbl AS FROM 'input.csv'. Moreover, there numerous SQL shorthands known as “friendly SQL” for more concise expressions, e.g., the GROUP BY ALL clause.

For data engineering use cases, DuckDB allows full control over the loading process, so it is possible to define the precise schema using a CREATE TABLE tbl schema statement and populate it using a COPY statement that specifies the CSV's dialect (delimiter, quotes, etc.). Most friendly SQL extensions are simple to rewrite to SQL queries that are fully compatible with PostgreSQL. For example, the GROUP BY ALL clause can be replaced with a GROUP BY clause and an explicit list of columns.

What are typical use cases for DuckDB?

DuckDB's use cases can be split into roughly three major categories. Namely, DuckDB can be used for interactive data analysis by a user (“data science”) and as pipeline component for automated data processing (“data engineering”). DuckDB can also be deployed in novel architectures, where one traditionally couldn't run an analytical database management system but DuckDB is available thanks to its portability. These architectures include running DuckDB in browsers (using the WebAssembly client) and on smartphones. Additionally, DuckDB's extensions unlock use cases such as geospatial analysis and deep integration with other database systems. And finally, in some cases, DuckDB doesn't even need data to be a database.

I would like feature X to be implemented in DuckDB. How do I proceed?

Features in DuckDB can be implemented in different ways: in the main DuckDB project, as a core extension or a community extension. If you have a feature request for DuckDB, please follow these guidelines:

If you have a feature idea, please raise an issue in the “Ideas” section in DuckDB's GitHub Discussions. The DuckDB team monitors these ideas and, over time, implements the frequently requested features. For example, we recently published the Avro Community Extension to support reading Avro files, which was the most requested feature in the issue tracker.
If you would like to implement a feature in the main DuckDB project, please discuss it with the DuckDB team on GitHub Discussions or on our Discord server. The team can verify whether the idea and the proposed implementation line up with the project's long-term vision.
If you would like to implement a feature as an extension, consider submitting it to the Community Extensions repository.

Please note that DuckDB Labs, the company that employs the main DuckDB contributors, provides consultancy services for DuckDB, which can include implementing features in DuckDB or as DuckDB extensions.

Which DuckDB clients and versions are officially supported?

While the DuckDB database is a relatively small, lean codebase, it has a large surface area with dozens of clients and extensions. Currently, the official community support applies to the following components:

Ths support covers the following minor versions:

latest LTS (long-term support) version, currently 1.4
the latest stable version, currently 1.4

For more details, see the DuckDB Community Support Policy.

How frequently are new DuckDB versions released?

New feature releases (e.g., v1.2.0) are released every 3–5 months. Bugfix releases (e.g., v1.1.3) are released every 2–4 weeks after a feature release. You can find the recent releases in the Release Calendar.

When is the next version going to be released and what features can I expect?

Please check the Release Calendar for the planned release date of the next stable version of DuckDB and the Development Roadmap for the features planned for the upcoming year.

How can I contribute to the DuckDB documentation?

The DuckDB website is hosted by GitHub Pages and is deployed from the repository at duckdb/duckdb-web. When the documentation is browsed from a desktop computer, every page has a “Page Source” button on the top that navigates you to its Markdown source file. Pull requests to fix issues or to expand the documentation section on DuckDB's features are very welcome. Before opening a pull request, please consult our Contributor Guide.

What are official sources on DuckDB?

In the following, we list official, authoritative sources on the DuckDB and the DuckLake projects. Exercise caution when using other sources. You should be particularly cautious when downloading binaries and installation scripts from other sources.

Websites:

duckdb.org and duckdb.io: DuckDB
duckdblabs.com: DuckDB Labs
ducklake.select and ducklake.dev: DuckLake

Social media: