DuckDB is an in-process
SQL OLAP database management system
Why DuckDB?
Simple
- In-process, serverless
- C++11, no dependencies, single file build
- APIs for Python/R/Java/…
All the benefits of a database, none of the hassle.
Installation
Choose your environment to use for DuckDB
- Command Line
- Python
- R
- Java
- node.js
- Julia
- C++
- ODBC
Latest release: DuckDB 0.7.1 System detected: Other Installations
pip install duckdb==0.7.1
install.packages("duckdb")
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>0.7.1</version>
</dependency>
More Options
npm install duckdb
using Pkg
Pkg.add("DuckDB")
https://github.com/
https://github.com/
https://github.com/
https://github.com
https://github.com
https://github.com
https://github.com/
https://github.com/
Not available
When to use DuckDB
- Processing and storing tabular datasets, e.g. from CSV or Parquet files
- Interactive data analysis, e.g. Joining & aggregate multiple large tables
- Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
- Large result set transfer to client
When to not use DuckDB
- High-volume transactional use cases (e.g. tracking orders in a webshop)
- Large client/server installations for centralized enterprise data warehousing
- Writing to a single database from multiple concurrent processes
- Multiple concurrent processes reading from a single writable database
Blog
ArchiveShredding Deeply Nested JSON, One Vector at a Time
TL;DR: We’ve recently improved DuckDB’s JSON extension so JSON files can be directly queried as if they were tables. DuckDB has a JSON extension that can be installed and loaded through SQL: INSTALL 'json'; LOAD 'json'; The JSON extension supports various functions to create, read, and manipulate JSON strings. These […]
continue readingJupySQL Plotting with DuckDB
TLDR JupySQL provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib. Introduction Data visualization is essential for every data practitioner since it allows us to find patterns that otherwise would be hard to see. The typical approach for plotting tabular datasets […]
continue readingAnnouncing DuckDB 0.7.0
The DuckDB team is happy to announce the latest DuckDB version (0.7.0) has been released. This release of DuckDB is named “Labradorius” after the Labrador Duck (Camptorhynchus labradorius) that was native to North America. To install the new version, please visit the installation guide. The full release notes can be […]
continue reading