DuckDB is an in-process
SQL OLAP database management system

Why DuckDB?

Simple

  • In-process, serverless
  • C++11, no dependencies, single file build
  • APIs for Python/R/Java/…
more

Feature-rich

  • Transactions, persistence
  • Extensive SQL support
  • Direct Parquet & CSV querying
more

Fast

  • Vectorized engine
  • Optimized for analytics
  • Parallel query processing
more

Free

  • Free & Open Source
  • Permissive MIT License
more

All the benefits of a database, none of the hassle.

When to use DuckDB

  • Processing and storing tabular datasets, e.g. from CSV or Parquet files
  • Interactive data analysis, e.g. Joining & aggregate multiple large tables
  • Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
  • Large result set transfer to client

When to not use DuckDB

  • High-volume transactional use cases (e.g. tracking orders in a webshop)
  • Large client/server installations for centralized enterprise data warehousing
  • Writing to a single database from multiple concurrent processes

Blog

Archive
2022-05-04

Friendlier SQL with DuckDB

An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB’s architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, […]

continue reading
2022-03-07

Parallel Grouped Aggregation in DuckDB

TL;DR: DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (“OLAP”) because it is useful for computing statistical summaries of huge tables. DuckDB contains a highly optimized […]

continue reading
2022-01-06

DuckDB Time Zones: Supporting Calendar Extensions

TLDR: The DuckDB ICU extension now provides time zone support. Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. The most well supported library for locale-specific operations is the International Components for Unicode (ICU). DuckDB already provided collated string comparisons using […]

continue reading