2024-03-26Hannes Mühleisen

42.parquet – A Zip Bomb for the Big Data Age

A 42 kB Parquet file can contain over 4 PB of data. continue reading
2024-03-22Sam Ansmink

Dependency Management in DuckDB Extensions

While core DuckDB has zero external dependencies, building extensions with dependencies is now very simple, with built-in support for vcpkg, an open-source package manager with support for over 2000 C/C++ packages. Interested in building your own? Check out the [extension template](https://github.com/duckdb/extension-template). continue reading
2024-03-01Alex Monahan

SQL Gymnastics: Bending SQL into flexible new shapes

Combining multiple features of DuckDB’s friendly SQL allows for highly flexible queries that can be reused across tables. continue reading
2024-02-13Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.10.0

The DuckDB team is happy to announce the latest DuckDB release (0.10.0). This release is named Fusca after the [Velvet scoter](https://en.wikipedia.org/wiki/Velvet_scoter) native to Europe. continue reading
2024-01-26Mark Raasveldt

Multi-Database Support in DuckDB

DuckDB can attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner. continue reading
2023-12-18Carlo Piovesan

Extensions for DuckDB-Wasm

DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser. continue reading
2023-11-03Tom Ebergen

Updates to the H2O.ai db-benchmark!

The H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size. continue reading
2023-10-27Pedro Holanda

DuckDB's CSV Sniffer: Automatic Detection of Types and Dialects

DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format. continue reading
2023-10-06Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas

DuckCon #4 in Amsterdam

continue reading
2023-09-26Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.9.0

continue reading
2023-09-15Richard Wesley

DuckDB's AsOf Joins: Fuzzy Temporal Lookups

DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics. continue reading
2023-08-23Alex Monahan

Even Friendlier SQL with DuckDB

DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before! continue reading
2023-08-04Pedro Holanda

DuckDB ADBC - Zero-Copy data transfer via Arrow Database Connectivity

DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application. continue reading
2023-07-07Pedro Holanda, Thijs Bruineman and Phillip Cloud

From Waddle to Flying: Quickly expanding DuckDB's functionality with Scalar Python UDFs

DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety. continue reading
2023-05-17Mark Raasveldt and Hannes Mühleisen

Announcing DuckDB 0.8.0

continue reading
2023-05-12Mark Raasveldt and Hannes Mühleisen

10 000 Stars on GitHub

continue reading
2023-04-28Max Gabrielsson

PostGEESE? Introducing The DuckDB Spatial Extension

DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb_spatial) to enable geospatial processing. continue reading
2023-04-28Hannes Mühleisen

DuckCon #3 in San Francisco

continue reading
2023-04-21Tristan Celder

Introducing DuckDB for Swift

DuckDB now has a native Swift API. DuckDB on mobile here we go! continue reading
2023-04-14Tom Ebergen

The Return of the H2O.ai Database-like Ops Benchmark

We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it. continue reading
2023-03-03Laurens Kuiper

Shredding Deeply Nested JSON, One Vector at a Time

We've recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables. continue reading
2023-02-24Guest post by Eduardo Blancas

JupySQL Plotting with DuckDB

[JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib. continue reading
2023-02-13Mark Raasveldt

Announcing DuckDB 0.7.0

continue reading
2022-11-14Mark Raasveldt

Announcing DuckDB 0.6.0

continue reading
2022-10-28Mark Raasveldt

Lightweight Compression in DuckDB

DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression. continue reading
2022-10-12Guest post by Jacob Matson

Modern Data Stack in a Box with DuckDB

A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of [DuckDB](https://duckdb.org/), [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/). continue reading
2022-09-30Hannes Mühleisen

Querying Postgres Tables Directly From DuckDB

DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data. continue reading
2022-07-27Pedro Holanda

Persistent Storage of Adaptive Radix Trees in DuckDB

DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap. continue reading
2022-05-27Richard Wesley

Range Joins in DuckDB

DuckDB has fully parallelised range joins that can efficiently join millions of range predicates. continue reading
2022-03-07Hannes Mühleisen and Mark Raasveldt

Parallel Grouped Aggregation in DuckDB

DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. continue reading
2022-01-06Richard Wesley

DuckDB Time Zones: Supporting Calendar Extensions

The DuckDB ICU extension now provides time zone support. continue reading
2021-12-03Pedro Holanda and Jonathan Keane

DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB

The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs. continue reading
2021-11-12Richard Wesley

Fast Moving Holistic Aggregates

DuckDB, a free and Open-Source analytical data management system, has a windowing API that can compute complex moving aggregates like interquartile ranges and median absolute deviation much faster than the conventional approaches. continue reading
2021-10-29André Kohn and Dominik Moritz

DuckDB-Wasm: Efficient Analytical SQL in the Browser

[DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. You can try it in your browser at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb). continue reading
2021-10-13Richard Wesley

Windowing in DuckDB

DuckDB, a free and Open-Source analytical data management system, has a state-of-the-art windowing engine that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages. continue reading
2021-08-27Laurens Kuiper

Fastest table sort in the West - Redesigning DuckDB’s sort

DuckDB, a free and Open-Source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory. continue reading
2021-06-25Hannes Mühleisen and Mark Raasveldt

Querying Parquet with Precision using DuckDB

DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format. continue reading
2021-05-14Mark Raasveldt and Hannes Mühleisen

Efficient SQL on Pandas with DuckDB

DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames. continue reading
2021-01-25Laurens Kuiper

Testing out DuckDB's Full Text Search Extension

DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5. continue reading