Search Shortcut cmd + k | ctrl + k
rawduck

Schema-less JSON and OTEL ingestion + analytics for DuckDB

Maintainer(s): lmangani

Installing and Loading

INSTALL rawduck FROM community;
LOAD rawduck;

Example

-- Attach a rawduck db/catalog to your session
ATTACH 'rawduck:store.db' AS raw;

-- no table 'events' exists yet
INSERT INTO raw.ingest.events VALUES
    ('{"id": 1, "action": "click", "ts": "2024-01-15T10:30:00", "user": {"name": "alice"}}'),
    ('{"id": 2, "action": "view",  "ts": "2024-01-15T10:31:00", "user": {"name": "bob", "plan": "pro"}}');

DESCRIBE raw.events;
-- id BIGINT, action VARCHAR, ts TIMESTAMP, user.name VARCHAR, user.plan VARCHAR

SELECT "user.name", count(*) FROM raw.events GROUP BY 1;

About rawduck

RawDuck brings the RawMergeTree "ingest first, schema later" model to DuckDB: point raw JSON, NDJSON files, or OTLP telemetry at tables that don't exist yet — RawDuck creates them, types them, flattens nested objects into real columns, transforms and evolves the schema as the data changes.

RawDuck ships with a full-featured API to ingest/query data including OpenTelemetry (OTLP) formats.

Consult the RawDuck README for more examples and details.

Added Functions

function_name function_type description comment examples
raw_flush table NULL NULL  
raw_infer scalar NULL NULL  
raw_ingest table NULL NULL  
raw_ingest_file table NULL NULL  
raw_optimize table NULL NULL  
raw_project table NULL NULL  
raw_projections table NULL NULL  
raw_records table NULL NULL  
raw_serve table NULL NULL  
raw_serve_grpc table NULL NULL  
raw_serve_grpc_stop table NULL NULL  
raw_serve_stop table NULL NULL  
raw_stats table NULL NULL  
raw_stats_load table NULL NULL  
raw_stats_save table NULL NULL  
raw_transform_define scalar NULL NULL  
raw_transforms table NULL NULL  
raw_type scalar NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

name description input_type scope aliases
rawduck_async_busy_timeout_ms Async insert buffer flush age threshold BIGINT GLOBAL []
rawduck_async_insert Buffer ingestion calls and flush asynchronously BOOLEAN GLOBAL []
rawduck_async_max_data_size Async insert buffer flush threshold in bytes BIGINT GLOBAL []
rawduck_insert_transform Transform name or explode path applied by INSERTs into ingest tables VARCHAR GLOBAL []
rawduck_overlap_flush Flush completed row groups during a large multi-threaded ingest while the schema is stable, overlapping parse with compression/IO. Faster on large stable-schema imports at the cost of higher peak memory; off by default (drain-free) BOOLEAN GLOBAL []
rawduck_use_projections Rewrite eligible count(*) aggregations onto fresh materialized projections (append-only workloads) BOOLEAN GLOBAL []