Read YAML files into DuckDB with native YAML type support, comprehensive extraction functions, and seamless JSON interoperability
Maintainer(s):
teaguesterling
Installing and Loading
INSTALL yaml FROM community;
LOAD yaml;
Example
-- Load the extension
LOAD yaml;
-- Query YAML files directly
SELECT * FROM 'config.yaml';
SELECT * FROM 'data/*.yml' WHERE active = true;
-- Create tables with YAML columns
CREATE TABLE configs(id INTEGER, config YAML);
INSERT INTO configs VALUES (1, E'server: production\nport: 8080\nfeatures: [logging, metrics]');
-- Extract data using YAML functions
SELECT
yaml_extract_string(config, '$.server') AS environment,
yaml_extract(config, '$.port') AS port,
yaml_extract(config, '$.features[0]') AS first_feature
FROM configs;
-- Convert between YAML and JSON
SELECT yaml_to_json(config) AS json_config FROM configs;
SELECT value_to_yaml({name: 'John', age: 30}) AS yaml_person;
-- Write query results to YAML
COPY (SELECT * FROM users) TO 'output.yaml' (FORMAT yaml, STYLE block);
About yaml
The YAML extension brings comprehensive YAML support to DuckDB, enabling seamless integration of YAML data within SQL queries.
Key Features:
- Native YAML Type: Full YAML type support with automatic casting between YAML, JSON, and VARCHAR
- File Reading: Read YAML files with
read_yaml()
andread_yaml_objects()
functions supporting multi-document files, top-level sequences, and robust error handling - Direct File Querying: Query YAML files directly using
FROM 'file.yaml'
syntax - Extraction Functions: Query YAML data with
yaml_extract()
,yaml_type()
,yaml_exists()
, and path-based extraction - Type Detection: Comprehensive automatic type detection for temporal types (DATE, TIME, TIMESTAMP), optimal numeric types, and boolean values
- Column Type Specification: Explicitly define column types when reading YAML files for schema consistency
- YAML Output: Write query results to YAML files using
COPY TO
with configurable formatting styles - Multi-Document Support: Handle files with multiple YAML documents separated by
---
- Error Recovery: Continue processing valid documents even when some contain errors
- JSON Interoperability: Seamless conversion between YAML and JSON formats
Example Use Cases:
- Configuration file management and querying
- Log file analysis and processing
- Data migration between YAML and relational formats
- Integration with YAML-based CI/CD pipelines
- Processing Kubernetes manifests and Helm charts
The extension is built using yaml-cpp and follows DuckDB's extension development best practices, ensuring reliable performance and cross-platform compatibility.
Note: This extension was written primarily using Claude and Claude Code as an exercise in AI-driven development.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
copy_format_yaml | scalar | NULL | NULL | |
format_yaml | scalar | NULL | NULL | |
read_yaml | table | NULL | NULL | |
read_yaml_objects | table | NULL | NULL | |
value_to_yaml | scalar | NULL | NULL | |
yaml_exists | scalar | NULL | NULL | |
yaml_extract | scalar | NULL | NULL | |
yaml_extract_string | scalar | NULL | NULL | |
yaml_get_default_style | scalar | NULL | NULL | |
yaml_set_default_style | scalar | NULL | NULL | |
yaml_to_json | scalar | NULL | NULL | |
yaml_type | scalar | NULL | NULL | |
yaml_valid | scalar | NULL | NULL |
Added Types
type_name | type_size | logical_type | type_category | internal |
---|---|---|---|---|
yaml | 16 | VARCHAR | STRING | true |