DuckDB is an in-process
SQL OLAP database management system
Why DuckDB?
Simple and portable
- In-process, serverless
- C++11, no dependencies, single-file build
- APIs for Python, R, Java, Julia, Swift, …
- Runs on Windows, Linux, macOS, OpenBSD, …
Feature-rich
- Transactions, persistence
- Extensive SQL support
- Direct Parquet, CSV, and JSON querying
- Joins, aggregates, window functions
Fast
- Optimized for analytics
- Vectorized and parallel engine
- Larger than memory processing
- Parallel Parquet, CSV, and NDJSON loaders
All the benefits of a database, none of the hassle.
Installation
Choose your environment to use for DuckDB
- Command Line
- Python
- R
- Java
- Node.js
- ODBC
Latest release: DuckDB 0.9.2 System detected: Other Installations
pip install duckdb==0.9.2
install.packages("duckdb")
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>0.9.2</version>
</dependency>
More Options
npm install duckdb
brew install duckdb
---
Direct download: https://github.com
https://github.com
https://github.com
https://github.com/
https://github.com/
Not available
When to use DuckDB
- Processing and storing tabular datasets, e.g., from CSV or Parquet files
- Interactive data analysis, e.g., join & aggregate multiple large tables
- Concurrent large changes, to multiple large tables, e.g., appending rows, adding/removing/updating columns
- Large result set transfer to client
When to not use DuckDB
- High-volume transactional use cases (e.g., tracking orders in a webshop)
- Large client/server installations for centralized enterprise data warehousing
- Writing to a single database from multiple concurrent processes
- Multiple concurrent processes reading from a single writable database
Blog
ArchiveUpdates to the H2O.ai db-benchmark!
TL;DR: the H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size. Skip […]
continue readingDuckDB's CSV Sniffer: Automatic Detection of Types and Dialects
TL;DR: DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV […]
continue readingDuckCon #4 in Amsterdam
We are excited to hold the next “DuckCon” DuckDB user group meeting for the first time in the birthplace of DuckDB, Amsterdam, the Netherlands. The meeting will take place on February 2, 2024 (Friday) in the OBA Congress Center’s Theater room, five minutes walking distance from Amsterdam Central Station. Conveniently, […]
continue reading