DuckDB is an in-process
SQL OLAP database management system

Why DuckDB?


  • In-process, serverless
  • C++11, no dependencies, single file build
  • APIs for Python/R/Java/…


  • Transactions, persistence
  • Extensive SQL support
  • Direct Parquet & CSV querying


  • Vectorized engine
  • Optimized for analytics
  • Parallel query processing


  • Free & Open Source
  • Permissive MIT License

All the benefits of a database, none of the hassle.


Choose your environment to use for DuckDB

  • Python
  • R
  • Java
  • node.js
  • C++
  • CLI
pip install duckdb==0.3.1

Latest release: DuckDB 0.3.1 System detected: Other Installations

When to use DuckDB

  • Processing and storing tabular datasets, e.g. from CSV or Parquet files
  • Interactive data analysis, e.g. Joining & aggregate multiple large tables
  • Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
  • Large result set transfer to client

When to not use DuckDB

  • Non-rectangular data sets, e.g. graphs
  • High-volume transactional use cases (e.g. tracking orders in a webshop)
  • Large client/server installations for centralized enterprise data warehousing
  • Writing to a single database from multiple concurrent processes



DuckDB Time Zones: Supporting Calendar Extensions

TLDR: The DuckDB ICU extension now provides time zone support. Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. The most well supported library for locale-specific operations is the International Components for Unicode (ICU). DuckDB already provided collated string comparisons using […]

continue reading

DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB

TLDR: The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs. This post is a collaboration with and cross-posted on the Arrow blog. Part of Apache Arrow is an in-memory data format optimized […]

continue reading

DuckDB - The Lord of Enums:
The Fellowship of the Categorical and Factors.

String types are one of the most commonly used types. However, often string columns have a limited number of distinct values. For example, a country column will never have more than a few hundred unique entries. Storing a data type as a plain string causes a waste of storage and […]

continue reading