Arrow IPC over HTTP client with array functions and bloom filters
Maintainer(s):
dazzleduck-web
Installing and Loading
INSTALL dazzleduck FROM community;
LOAD dazzleduck;
Example
For documentation and examples, visit: https://dazzleduck-web.github.io/dazzleduck-website/
About dazzleduck
DazzleDuck is a DuckDB extension that provides a client to execute queries remotely on DazzleDuck SQL Server via Arrow IPC over HTTP protocol. It also includes array utility functions and bloom filters.
Features
Arrow IPC over HTTP Client
dd_read_arrow(url, query): Execute queries on remote DuckDB servers via Arrow IPC over HTTPdd_login(url, user, pass): Authenticate and get auth tokendd_splits(url, table): Get table splits for parallel processing (returns endpoints, query_id, query, producer_id, split_size, query_checksum)- Filter Pushdown: WHERE clause filters are automatically sent to the remote server
- Projection Pushdown: Only selected columns are fetched from the server
- Aggregation Pushdown: Automatically detects aggregation queries and pushes them to the remote server, transferring only aggregated results instead of all rows
- Split Mode: Enable parallel execution with
split := trueparameter
Supported aggregations for pushdown:
| Aggregate | Non-split mode | Split mode |
|---|---|---|
COUNT(*) |
Yes | Yes |
COUNT(col) |
Yes | Yes |
SUM(col) |
Yes | Yes |
MIN(col) |
Yes | Yes |
MAX(col) |
Yes | Yes |
AVG(col) |
Yes | No (local fallback) |
COUNT(DISTINCT col) |
Yes | No (local fallback) |
In split mode, only split-safe aggregations (those whose partial results can be merged across splits) are pushed down. Non-split-safe aggregations like AVG and COUNT(DISTINCT ...) fall back to local execution automatically. If any aggregate in a query is not supported for pushdown, the entire query falls back to local execution.
Array Functions
dd_array_contains_all(haystack, needle): Check if all needle elements exist in haystackdd_array_contains_all(haystack, needle, bloom_filter): Optimized version with pre-computed bloom filter
Bloom Filter Functions
dd_bloom_filter_create(array): Create bloom filter from string arraydd_bloom_filter_contains(filter, value): Check single value membershipdd_bloom_filter_contains_all(filter, array): Check multiple values membership
Usage Examples
-- Check array containment
SELECT dd_array_contains_all(['a','b','c','d'], ['a','c']); -- true
-- Create and use bloom filter for large datasets
WITH products AS (SELECT dd_bloom_filter_create(list(name)) AS bf FROM catalog)
SELECT * FROM search_terms
WHERE dd_bloom_filter_contains(products.bf, term);
-- Query remote DuckDB server
SELECT * FROM dd_read_arrow('https://server:8081', sql := 'SELECT * FROM sales');
-- With filter pushdown (filters sent to server)
SELECT * FROM dd_read_arrow('https://server:8081', sql := 'SELECT * FROM orders')
WHERE status = 'pending' AND amount > 100;
-- With projection pushdown (only selected columns fetched)
SELECT name, email FROM dd_read_arrow('https://server:8081', source_table := 'users');
-- With parallel execution (split mode)
SELECT * FROM dd_read_arrow('https://server:8081', source_table := 'mydb.mytable', split := true);
-- Aggregation pushdown (aggregations executed on server)
SELECT region, count(*), sum(amount)
FROM dd_read_arrow('https://server:8081', source_table := 'orders')
WHERE status = 'pending'
GROUP BY region;
-- Inspect table splits
SELECT * FROM dd_splits('https://server:8081', source_table := 'mydb.mytable');
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| dd_array_contains_all | scalar | NULL | NULL | |
| dd_bloom_filter_contains | scalar | NULL | NULL | |
| dd_bloom_filter_contains_all | scalar | NULL | NULL | |
| dd_bloom_filter_create | scalar | NULL | NULL | |
| dd_login | scalar | NULL | NULL | |
| dd_read_arrow | table | NULL | NULL | |
| dd_search | scalar | NULL | NULL | |
| dd_splits | table | NULL | NULL | |
| dd_version | scalar | NULL | NULL |