DuckDB is an in-process
SQL OLAP database management system

Why DuckDB?

Simple

  • In-process, serverless
  • C++11, no dependencies, single file build
  • APIs for Python/R/Java/…
more

Feature-rich

  • Transactions, persistence
  • Extensive SQL support
  • Direct Parquet & CSV querying
more

Fast

  • Vectorized engine
  • Optimized for analytics
  • Parallel query processing
more

Free

  • Free & Open Source
  • Permissive MIT License
more

All the benefits of a database, none of the hassle.

Installation

Choose your environment to use for DuckDB

  • Python
  • R
  • Java
  • node.js
  • C++
  • CLI
pip install duckdb==0.3.0

Latest release: DuckDB 0.3.0 System detected: Other Installations

When to use DuckDB

  • Processing and storing tabular datasets, e.g. from CSV or Parquet files
  • Interactive data analysis, e.g. Joining & aggregate multiple large tables
  • Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
  • Large result set transfer to client

When to not use DuckDB

  • Non-rectangular data sets, e.g. graphs
  • High-volume transactional use cases (e.g. tracking orders in a webshop)
  • Large client/server installations for centralized enterprise data warehousing
  • Writing to a single database from multiple concurrent processes

Blog

Archive
2021-10-13

Windowing in DuckDB

TLDR: DuckDB, a free and Open-Source analytical data management system, has a state-of-the-art windowing engine that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages. Window functions (those using the OVER clause) are important tools for analysing data series, but they can be slow if […]

→ continue reading
2021-08-27

Fastest table sort in the West - Redesigning DuckDB’s sort

TLDR: DuckDB, a free and Open-Source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory. Database systems use sorting for many purposes, the most obvious purpose being when a user adds an ORDER BY clause to […]

→ continue reading
2021-06-25

Querying Parquet with Precision using DuckDB

TLDR: DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format. Apache Parquet is the most common “Big Data” storage format for analytics. In Parquet files, data is stored in […]

→ continue reading