DuckDB is an in-process
SQL OLAP database management system
Why DuckDB?
Simple
- In-process, serverless
- C++11, no dependencies, single file build
- APIs for Python/R/Java/…
All the benefits of a database, none of the hassle.
Installation
Choose your environment to use for DuckDB
- Python
- R
- Java
- node.js
- C++
- CLI
- ODBC
Latest release: DuckDB 0.3.4 System detected: Other Installations
pip install duckdb==0.3.4
install.packages("duckdb")
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>0.3.4</version>
</dependency>
More Options
npm install duckdb
https://github.com/
https://github.com/
https://github.com/
https://github.com
https://github.com
https://github.com
https://github.com/
https://github.com/
When to use DuckDB
- Processing and storing tabular datasets, e.g. from CSV or Parquet files
- Interactive data analysis, e.g. Joining & aggregate multiple large tables
- Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
- Large result set transfer to client
When to not use DuckDB
- High-volume transactional use cases (e.g. tracking orders in a webshop)
- Large client/server installations for centralized enterprise data warehousing
- Writing to a single database from multiple concurrent processes
Blog
ArchiveFriendlier SQL with DuckDB
An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB’s architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, […]
continue readingParallel Grouped Aggregation in DuckDB
TL;DR: DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (“OLAP”) because it is useful for computing statistical summaries of huge tables. DuckDB contains a highly optimized […]
continue readingDuckDB Time Zones: Supporting Calendar Extensions
TLDR: The DuckDB ICU extension now provides time zone support. Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. The most well supported library for locale-specific operations is the International Components for Unicode (ICU). DuckDB already provided collated string comparisons using […]
continue reading