Search Shortcut cmd + k | ctrl + k
dazzleduck

Arrow IPC over HTTP client with array functions and bloom filters

Maintainer(s): dazzleduck-web

Installing and Loading

INSTALL dazzleduck FROM community;
LOAD dazzleduck;

Example

For documentation and examples, visit: https://dazzleduck-web.github.io/dazzleduck-website/

About dazzleduck

DazzleDuck is a DuckDB extension that provides a client to execute queries remotely on DazzleDuck SQL Server via Arrow IPC over HTTP protocol. It also includes array utility functions and bloom filters.

Features

Arrow IPC over HTTP Client

  • dd_read_arrow(url, query): Execute queries on remote DuckDB servers via Arrow IPC over HTTP
  • dd_login(url, user, pass): Authenticate and get auth token
  • dd_splits(url, table): Get table splits for parallel processing (returns endpoints, query_id, query, producer_id, split_size, query_checksum)
  • Filter Pushdown: WHERE clause filters are automatically sent to the remote server
  • Projection Pushdown: Only selected columns are fetched from the server
  • Aggregation Pushdown: Automatically detects aggregation queries and pushes them to the remote server, transferring only aggregated results instead of all rows
  • Split Mode: Enable parallel execution with split := true parameter

Supported aggregations for pushdown:

Aggregate Non-split mode Split mode
COUNT(*) Yes Yes
COUNT(col) Yes Yes
SUM(col) Yes Yes
MIN(col) Yes Yes
MAX(col) Yes Yes
AVG(col) Yes No (local fallback)
COUNT(DISTINCT col) Yes No (local fallback)

In split mode, only split-safe aggregations (those whose partial results can be merged across splits) are pushed down. Non-split-safe aggregations like AVG and COUNT(DISTINCT ...) fall back to local execution automatically. If any aggregate in a query is not supported for pushdown, the entire query falls back to local execution.

Array Functions

  • dd_array_contains_all(haystack, needle): Check if all needle elements exist in haystack
  • dd_array_contains_all(haystack, needle, bloom_filter): Optimized version with pre-computed bloom filter

Bloom Filter Functions

  • dd_bloom_filter_create(array): Create bloom filter from string array
  • dd_bloom_filter_contains(filter, value): Check single value membership
  • dd_bloom_filter_contains_all(filter, array): Check multiple values membership

Usage Examples

-- Check array containment
SELECT dd_array_contains_all(['a','b','c','d'], ['a','c']);  -- true

-- Create and use bloom filter for large datasets
WITH products AS (SELECT dd_bloom_filter_create(list(name)) AS bf FROM catalog)
SELECT * FROM search_terms
WHERE dd_bloom_filter_contains(products.bf, term);

-- Query remote DuckDB server
SELECT * FROM dd_read_arrow('https://server:8081', sql := 'SELECT * FROM sales');

-- With filter pushdown (filters sent to server)
SELECT * FROM dd_read_arrow('https://server:8081', sql := 'SELECT * FROM orders')
WHERE status = 'pending' AND amount > 100;

-- With projection pushdown (only selected columns fetched)
SELECT name, email FROM dd_read_arrow('https://server:8081', source_table := 'users');

-- With parallel execution (split mode)
SELECT * FROM dd_read_arrow('https://server:8081', source_table := 'mydb.mytable', split := true);

-- Aggregation pushdown (aggregations executed on server)
SELECT region, count(*), sum(amount)
FROM dd_read_arrow('https://server:8081', source_table := 'orders')
WHERE status = 'pending'
GROUP BY region;

-- Inspect table splits
SELECT * FROM dd_splits('https://server:8081', source_table := 'mydb.mytable');

Added Functions

function_name function_type description comment examples
dd_array_contains_all scalar NULL NULL  
dd_bloom_filter_contains scalar NULL NULL  
dd_bloom_filter_contains_all scalar NULL NULL  
dd_bloom_filter_create scalar NULL NULL  
dd_login scalar NULL NULL  
dd_read_arrow table NULL NULL  
dd_search scalar NULL NULL  
dd_splits table NULL NULL  
dd_version scalar NULL NULL