DuckDB is an in-process
SQL OLAP database management system

Why DuckDB?

Simple

  • In-process, serverless
  • C++11, no dependencies, single file build
  • APIs for Python/R/Java/…
more

Feature-rich

  • Transactions, persistence
  • Extensive SQL support
  • Direct Parquet & CSV querying
more

Fast

  • Vectorized engine
  • Optimized for analytics
  • Parallel query processing
more

Free

  • Free & Open Source
  • Permissive MIT License
more

All the benefits of a database, none of the hassle.

Installation

Choose your environment to use for DuckDB

  • Python
  • R
  • Java
  • node.js
  • C++
  • CLI
pip install duckdb==0.3.1

Latest release: DuckDB 0.3.1 System detected: Other Installations

When to use DuckDB

  • Processing and storing tabular datasets, e.g. from CSV or Parquet files
  • Interactive data analysis, e.g. Joining & aggregate multiple large tables
  • Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
  • Large result set transfer to client

When to not use DuckDB

  • Non-rectangular data sets, e.g. graphs
  • High-volume transactional use cases (e.g. tracking orders in a webshop)
  • Large client/server installations for centralized enterprise data warehousing
  • Writing to a single database from multiple concurrent processes

Blog

Archive
2022-01-06

DuckDB Time Zones: Supporting Calendar Extensions

TLDR: The DuckDB ICU extension now provides time zone support. Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. The most well supported library for locale-specific operations is the International Components for Unicode (ICU). DuckDB already provided collated string comparisons using […]

continue reading
2021-12-03

DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB

TLDR: The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs. This post is a collaboration with and cross-posted on the Arrow blog. Part of Apache Arrow is an in-memory data format optimized […]

continue reading
2021-11-26

DuckDB - The Lord of Enums:
The Fellowship of the Categorical and Factors.

String types are one of the most commonly used types. However, often string columns have a limited number of distinct values. For example, a country column will never have more than a few hundred unique entries. Storing a data type as a plain string causes a waste of storage and […]

continue reading