Schema-less JSON and OTEL ingestion + analytics for DuckDB
Maintainer(s):
lmangani
Installing and Loading
INSTALL rawduck FROM community;
LOAD rawduck;
Example
-- Attach a rawduck db/catalog to your session
ATTACH 'rawduck:store.db' AS raw;
-- no table 'events' exists yet
INSERT INTO raw.ingest.events VALUES
('{"id": 1, "action": "click", "ts": "2024-01-15T10:30:00", "user": {"name": "alice"}}'),
('{"id": 2, "action": "view", "ts": "2024-01-15T10:31:00", "user": {"name": "bob", "plan": "pro"}}');
DESCRIBE raw.events;
-- id BIGINT, action VARCHAR, ts TIMESTAMP, user.name VARCHAR, user.plan VARCHAR
SELECT "user.name", count(*) FROM raw.events GROUP BY 1;
About rawduck
RawDuck brings the RawMergeTree "ingest first, schema later" model to DuckDB: point raw JSON, NDJSON files, or OTLP telemetry at tables that don't exist yet — RawDuck creates them, types them, flattens nested objects into real columns, transforms and evolves the schema as the data changes.
RawDuck ships with a full-featured API to ingest/query data including OpenTelemetry (OTLP) formats.
Consult the RawDuck README for more examples and details.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| raw_flush | table | NULL | NULL | |
| raw_infer | scalar | NULL | NULL | |
| raw_ingest | table | NULL | NULL | |
| raw_ingest_file | table | NULL | NULL | |
| raw_optimize | table | NULL | NULL | |
| raw_project | table | NULL | NULL | |
| raw_projections | table | NULL | NULL | |
| raw_records | table | NULL | NULL | |
| raw_serve | table | NULL | NULL | |
| raw_serve_grpc | table | NULL | NULL | |
| raw_serve_grpc_stop | table | NULL | NULL | |
| raw_serve_stop | table | NULL | NULL | |
| raw_stats | table | NULL | NULL | |
| raw_stats_load | table | NULL | NULL | |
| raw_stats_save | table | NULL | NULL | |
| raw_transform_define | scalar | NULL | NULL | |
| raw_transforms | table | NULL | NULL | |
| raw_type | scalar | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| rawduck_async_busy_timeout_ms | Async insert buffer flush age threshold | BIGINT | GLOBAL | [] |
| rawduck_async_insert | Buffer ingestion calls and flush asynchronously | BOOLEAN | GLOBAL | [] |
| rawduck_async_max_data_size | Async insert buffer flush threshold in bytes | BIGINT | GLOBAL | [] |
| rawduck_insert_transform | Transform name or explode path applied by INSERTs into ingest tables | VARCHAR | GLOBAL | [] |
| rawduck_overlap_flush | Flush completed row groups during a large multi-threaded ingest while the schema is stable, overlapping parse with compression/IO. Faster on large stable-schema imports at the cost of higher peak memory; off by default (drain-free) | BOOLEAN | GLOBAL | [] |
| rawduck_use_projections | Rewrite eligible count(*) aggregations onto fresh materialized projections (append-only workloads) | BOOLEAN | GLOBAL | [] |