2024-03-26Hannes Mühleisen
A 42 kB Parquet file can contain over 4 PB of data.
continue reading
2024-03-22Sam Ansmink
While core DuckDB has zero external dependencies, building extensions with dependencies is now very simple, with built-in support for vcpkg, an open-source package manager with support for over 2000 C/C++ packages. Interested in building your own? Check out the [extension template](https://github.com/duckdb/extension-template).
continue reading
2024-03-01Alex Monahan
Combining multiple features of DuckDB’s friendly SQL allows for highly flexible queries that can be reused across tables.
continue reading
2024-02-13Mark Raasveldt and Hannes Mühleisen
The DuckDB team is happy to announce the latest DuckDB release (0.10.0). This release is named Fusca after the [Velvet scoter](https://en.wikipedia.org/wiki/Velvet_scoter) native to Europe.
continue reading
2024-01-26Mark Raasveldt
DuckDB can attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a convenient manner.
continue reading
2023-12-18Carlo Piovesan
DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser.
continue reading
2023-11-03Tom Ebergen
The H2O.ai db-benchmark has been updated with new results. In addition, the AWS EC2 instance used for benchmarking has been changed to a c6id.metal for improved repeatability and fairness across libraries. DuckDB is the fastest library for both join and group by queries at almost every data size.
continue reading
2023-10-27Pedro Holanda
DuckDB is primarily focused on performance, leveraging the capabilities of modern file formats. At the same time, we also pay attention to flexible, non-performance-driven formats like CSV files. To create a nice and pleasant experience when reading from CSV files, DuckDB implements a CSV sniffer that automatically detects CSV dialect options, column types, and even skips dirty data. The sniffing process allows users to efficiently explore CSV files without needing to provide any input about the file format.
continue reading
2023-10-06Mark Raasveldt, Hannes Mühleisen, Gabor Szarnyas
continue reading
2023-09-26Mark Raasveldt and Hannes Mühleisen
continue reading
2023-09-15Richard Wesley
DuckDB supports AsOf Joins – a way to match nearby values. They are especially useful for searching event tables for temporal analytics.
continue reading
2023-08-23Alex Monahan
DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before!
continue reading
2023-08-04Pedro Holanda
DuckDB has added support for [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/0.5.1/index.html), an API standard that enables efficient data ingestion and retrieval from database systems, similar to [Open Database Connectivity (ODBC)](https://learn.microsoft.com/en-us/sql/odbc/microsoft-open-database-connectivity-odbc?view=sql-server-ver16) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and an external application.
continue reading
2023-07-07Pedro Holanda, Thijs Bruineman and Phillip Cloud
DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB's fast execution model, SQL and data safety.
continue reading
2023-05-26Mark Raasveldt
continue reading
2023-05-17Mark Raasveldt and Hannes Mühleisen
continue reading
2023-05-12Mark Raasveldt and Hannes Mühleisen
continue reading
2023-04-28Max Gabrielsson
DuckDB now has an official [Spatial extension](https://github.com/duckdb/duckdb_spatial) to enable geospatial processing.
continue reading
2023-04-28Hannes Mühleisen
continue reading
2023-04-21Tristan Celder
DuckDB now has a native Swift API. DuckDB on mobile here we go!
continue reading
2023-04-14Tom Ebergen
We've resurrected the H2O.ai database-like ops benchmark with up to date libraries and plan to keep re-running it.
continue reading
2023-03-03Laurens Kuiper
We've recently improved DuckDB's JSON extension so JSON files can be directly queried as if they were tables.
continue reading
2023-02-24Guest post by Eduardo Blancas
[JupySQL](https://github.com/ploomber/jupysql) provides a seamless SQL experience in Jupyter and uses DuckDB to visualize larger than memory datasets in matplotlib.
continue reading
2023-02-13Mark Raasveldt
continue reading
2022-11-25Pedro Holanda
continue reading
2022-11-14Mark Raasveldt
continue reading
2022-10-28Mark Raasveldt
DuckDB supports efficient lightweight compression that is automatically used to keep data size down without incurring high costs for compression and decompression.
continue reading
2022-10-12Guest post by Jacob Matson
A fast, free, and open-source Modern Data Stack (MDS) can now be fully deployed on your laptop or to a single machine using the combination of [DuckDB](https://duckdb.org/), [Meltano](https://meltano.com/), [dbt](https://www.getdbt.com/), and [Apache Superset](https://superset.apache.org/).
continue reading
2022-09-30Hannes Mühleisen
DuckDB can now directly query tables stored in PostgreSQL and speed up complex analytical queries without duplicating data.
continue reading
2022-07-27Pedro Holanda
DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap.
continue reading
2022-05-27Richard Wesley
DuckDB has fully parallelised range joins that can efficiently join millions of range predicates.
continue reading
2022-05-04Alex Monahan
continue reading
2022-03-07Hannes Mühleisen and Mark Raasveldt
DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups.
continue reading
2022-01-06Richard Wesley
The DuckDB ICU extension now provides time zone support.
continue reading
2021-12-03Pedro Holanda and Jonathan Keane
The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs.
continue reading
2021-11-26Pedro Holanda
continue reading
2021-11-12Richard Wesley
DuckDB, a free and Open-Source analytical data management system, has a windowing API that can compute complex moving aggregates like interquartile ranges and median absolute deviation much faster than the conventional approaches.
continue reading
2021-10-29André Kohn and Dominik Moritz
[DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. You can try it in your browser at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb).
continue reading
2021-10-13Richard Wesley
DuckDB, a free and Open-Source analytical data management system, has a state-of-the-art windowing engine that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages.
continue reading
2021-08-27Laurens Kuiper
DuckDB, a free and Open-Source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory.
continue reading
2021-06-25Hannes Mühleisen and Mark Raasveldt
DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format.
continue reading
2021-05-14Mark Raasveldt and Hannes Mühleisen
DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames.
continue reading
2021-01-25Laurens Kuiper
DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5.
continue reading