DuckDB Archive

2024-08-15Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas, Kelly de Smit

DuckCon #5 in Seattle

2024-04-02Hannes Mühleisen

duckplyr: dplyr powered by DuckDB

The new R package duckplyr translates the dplyr API to DuckDB's execution engine. continue reading

2024-03-29Laurens Kuiper

No Memory? No Problem. External Aggregation in DuckDB

Since the 0.9.0 release, DuckDB’s fully parallel aggregate hash table can efficiently aggregate over many more groups than fit in memory. continue reading

2024-03-26Hannes Mühleisen

42.parquet – A Zip Bomb for the Big Data Age

A 42 kB Parquet file can contain over 4 PB of data. continue reading

2024-03-22Sam Ansmink

Dependency Management in DuckDB Extensions

While core DuckDB has zero external dependencies, building extensions with dependencies is now very simple, with built-in support for vcpkg, an open-source package manager with support for over 2000 C/C++ packages. Interested in building your own? Check out the [extension template](https://github.com/duckdb/extension-template). continue reading

2024-03-01Alex Monahan

SQL Gymnastics: Bending SQL into flexible new shapes

Combining multiple features of DuckDB’s [friendly SQL](/docs/guides/sql_features/friendly_sql) allows for highly flexible queries that can be reused across tables. continue reading

2024-02-13Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.10.0

The DuckDB team is happy to announce the latest DuckDB release (0.10.0). This release is named Fusca after the [Velvet scoter](https://en.wikipedia.org/wiki/Velvet_scoter) native to Europe. continue reading

2024-01-26Mark Raasveldt

Multi-Database Support in DuckDB

DuckDB can attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner. continue reading

2023-12-18Carlo Piovesan

Extensions for DuckDB-Wasm

DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser. continue reading

2023-11-03Tom Ebergen

Updates to the H2O.ai db-benchmark!

The H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size. continue reading

2023-10-27Pedro Holanda

DuckDB's CSV Sniffer: Automatic Detection of Types and Dialects

DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format. continue reading

2023-10-06Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas

DuckCon #4 in Amsterdam

2023-09-26Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.9.0

2023-09-15Richard Wesley

DuckDB's AsOf Joins: Fuzzy Temporal Lookups

DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics. continue reading

2023-08-23Alex Monahan

Even Friendlier SQL with DuckDB

DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before! For more details, see the documentation for [friendly SQL features](/docs/guides/sql_features/friendly_sql). continue reading

2023-08-04Pedro Holanda

DuckDB ADBC - Zero-Copy data transfer via Arrow Database Connectivity

DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application. continue reading

2023-07-07Pedro Holanda, Thijs Bruineman and Phillip Cloud

From Waddle to Flying: Quickly expanding DuckDB's functionality with Scalar Python UDFs

DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety. continue reading

2023-05-26Mark Raasveldt

Correlated Subqueries in SQL

2023-05-17Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.8.0

2023-05-12Mark Raasveldt and Hannes Mühleisen

10 000 Stars on GitHub

2023-04-28Max Gabrielsson

PostGEESE? Introducing The DuckDB Spatial Extension

DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb_spatial) to enable geospatial processing. continue reading

2023-04-28Hannes Mühleisen

DuckCon #3 in San Francisco

2023-04-21Tristan Celder

Introducing DuckDB for Swift

DuckDB now has a native Swift API. DuckDB on mobile here we go! continue reading

2023-04-14Tom Ebergen

The Return of the H2O.ai Database-like Ops Benchmark

We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it. continue reading

2023-03-03Laurens Kuiper

Shredding Deeply Nested JSON, One Vector at a Time

We've recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables. continue reading

2023-02-24Guest post by Eduardo Blancas

JupySQL Plotting with DuckDB

[JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib. continue reading

2023-02-13Mark Raasveldt

Announcing DuckDB 0.7.0

2022-11-25Pedro Holanda

DuckCon 2023 - 2nd edition

2022-11-14Mark Raasveldt

Announcing DuckDB 0.6.0

2022-10-28Mark Raasveldt

Lightweight Compression in DuckDB

DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression. continue reading

2022-10-12Guest post by Jacob Matson

Modern Data Stack in a Box with DuckDB

A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of [DuckDB](https://duckdb.org/), [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/). continue reading

2022-09-30Hannes Mühleisen

Querying Postgres Tables Directly From DuckDB

DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data. continue reading

2022-07-27Pedro Holanda

Persistent Storage of Adaptive Radix Trees in DuckDB

DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap. continue reading

2022-05-27Richard Wesley

Range Joins in DuckDB

DuckDB has fully parallelised range joins that can efficiently join millions of range predicates. continue reading

2022-05-04Alex Monahan

Friendlier SQL with DuckDB

DuckDB offers several extensions to the SQL syntax. For a full list of these features, see the [Friendly SQL documentation page](/docs/guides/sql_features/friendly_sql). continue reading

2022-03-07Hannes Mühleisen and Mark Raasveldt

Parallel Grouped Aggregation in DuckDB

DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. continue reading

2022-01-06Richard Wesley

DuckDB Time Zones: Supporting Calendar Extensions

The DuckDB ICU extension now provides time zone support. continue reading

2021-12-03Pedro Holanda and Jonathan Keane

DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB

The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs. continue reading

2021-11-26Pedro Holanda

DuckDB – The Lord of Enums:
The Fellowship of the Categorical and Factors.

2021-11-12Richard Wesley

Fast Moving Holistic Aggregates

DuckDB, a free and Open-Source analytical data management system, has a windowing API that can compute complex moving aggregates like interquartile ranges and median absolute deviation much faster than the conventional approaches. continue reading

2021-10-29André Kohn and Dominik Moritz

DuckDB-Wasm: Efficient Analytical SQL in the Browser

[DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. You can try it in your browser at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb). continue reading

2021-10-13Richard Wesley

Blog