DuckDB is an in-process
SQL OLAP database management system
Why DuckDB?
Simple
- In-process, serverless
- C++11, no dependencies, single file build
- APIs for Python/R/Java/…
All the benefits of a database, none of the hassle.
Installation
Choose your environment to use for DuckDB
- Python
- R
- Java
- node.js
- C++
- CLI
- ODBC
Latest release: DuckDB 0.4.0 System detected: Other Installations
pip install duckdb==0.4.0
install.packages("duckdb")
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>0.4.0</version>
</dependency>
More Options
npm install duckdb
https://github.com/
https://github.com/
https://github.com/
https://github.com
https://github.com
https://github.com
https://github.com/
https://github.com/
When to use DuckDB
- Processing and storing tabular datasets, e.g. from CSV or Parquet files
- Interactive data analysis, e.g. Joining & aggregate multiple large tables
- Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
- Large result set transfer to client
When to not use DuckDB
- High-volume transactional use cases (e.g. tracking orders in a webshop)
- Large client/server installations for centralized enterprise data warehousing
- Writing to a single database from multiple concurrent processes
Blog
ArchivePersistent Storage of Adaptive Radix Trees in DuckDB
TLDR: DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing […]
continue readingRange Joins in DuckDB
TL;DR: DuckDB has fully parallelised range joins that can efficiently join millions of range predicates. Range intersection joins are an important operation in areas such as temporal analytics, and occur when two inequality conditions are present in a join predicate. Database implementations often rely on slow O(N^2) algorithms that compare […]
continue readingFriendlier SQL with DuckDB
An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB’s architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, […]
continue reading