Build, transform, validate, and extract content from structured documents using the duck_block type
Maintainer(s):
teaguesterling
Installing and Loading
INSTALL duck_block_utils FROM community;
LOAD duck_block_utils;
Example
-- Build a document programmatically
SELECT db_assemble([
db_heading(1, 'Hello World'),
db_paragraph([
db_text('This is '),
db_bold('important'),
db_text(' content.')
]),
db_code('sql', 'SELECT * FROM documents')
]);
-- Works with markdown extension for reading/writing files
LOAD markdown;
LOAD duck_block_utils;
-- Extract table of contents from markdown
SELECT * FROM db_blocks_toc(
(SELECT list(b) FROM read_markdown_blocks('README.md') b)
);
-- Convert blocks to plain text
SELECT db_blocks_to_text(
(SELECT list(b) FROM read_markdown_blocks('doc.md') b)
);
-- Validate document structure
SELECT db_blocks_validate(
(SELECT list(b) FROM read_markdown_blocks('doc.md') b)
);
About duck_block_utils
The Duck Block Utils extension provides tools for building, transforming, validating, and
extracting content from structured documents. All functions use the duck_block type - a
unified representation for document elements that works across formats.
Documentation: duck-block-utils.readthedocs.io
The duck_block Type
A unified struct for both block-level and inline elements:
STRUCT(
kind VARCHAR, -- 'block' or 'inline'
element_type VARCHAR, -- 'heading', 'paragraph', 'bold', 'link', etc.
content VARCHAR, -- Text content (NULL if element has children)
level INTEGER, -- Structural depth (1=block, 2+=inline children)
encoding VARCHAR, -- 'text', 'json', 'html', etc.
attributes MAP(VARCHAR, VARCHAR), -- Element-specific metadata
element_order INTEGER -- Position in document
)
V2 API Design
All builder functions return LIST(duck_block) for uniform composition:
-- Builders can be nested naturally
SELECT db_paragraph([
db_text('Click '),
db_link('https://example.com', 'here'),
db_text(' for more.')
]);
Block Builders
| Function | Description |
|---|---|
db_heading(level, content) |
Create heading (h1-h6) |
db_paragraph(content) |
Create paragraph |
db_code(language, content) |
Create fenced code block |
db_blockquote(content) |
Create block quote |
db_list_block(ordered, items[]) |
Create ordered/unordered list |
db_hr() |
Create horizontal rule |
db_image(src, alt, title) |
Create image block |
db_metadata(content) |
Create YAML frontmatter |
db_raw(format, content) |
Create raw content block |
Inline Builders
| Function | Description |
|---|---|
db_text(content) |
Plain text |
db_bold(content) |
Bold/strong text |
db_italic(content) |
Italic/emphasis text |
db_link(href, content) |
Hyperlink |
db_inline_code(content) |
Inline code |
db_math(content) |
Math expression |
db_strikethrough(content) |
Strikethrough text |
db_superscript(content) |
Superscript |
db_subscript(content) |
Subscript |
Assembly Functions
| Function | Description |
|---|---|
db_assemble(blocks[]) |
Combine blocks into document |
db_document(blocks[]) |
Alias for db_assemble |
db_section(level, title, children[]) |
Create section with heading |
db_concat(blocks1, blocks2) |
Concatenate block lists |
db_rebase_levels(blocks, offset) |
Adjust heading levels |
Extraction Functions
| Function | Description |
|---|---|
db_blocks_to_text(blocks) |
Extract plain text |
db_blocks_headings(blocks) |
Extract heading hierarchy |
db_blocks_toc(blocks) |
Generate table of contents |
Validation
| Function | Description |
|---|---|
db_blocks_validate(blocks) |
Validate document structure |
Pandoc Integration
Convert between Pandoc JSON AST and duck_block without requiring Pandoc:
| Function | Description |
|---|---|
pandoc_inlines_to_db_inlines(json) |
Parse Pandoc inline AST |
db_inlines_to_pandoc(blocks) |
Convert to Pandoc inline AST |
Ecosystem Integration
Duck Block Utils works seamlessly with other DuckDB document extensions:
- duckdb_markdown - Read markdown files with
read_markdown_blocks(), write withCOPY ... TO ... (FORMAT markdown, MARKDOWN_MODE duck_block) - duckdb_webbed - HTML parsing and generation
-- Read markdown, manipulate with duck_block_utils, write back
LOAD markdown;
LOAD duck_block_utils;
-- Add a disclaimer to all documents
COPY (
SELECT block.* FROM (
SELECT unnest(db_assemble([
(SELECT list(b) FROM read_markdown_blocks('doc.md') b),
db_hr(),
db_paragraph('Generated by DuckDB')
])) as block
)
) TO 'output.md' (FORMAT markdown, MARKDOWN_MODE duck_block);
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| db_assemble | scalar | NULL | NULL | |
| db_blockquote | scalar | NULL | NULL | |
| db_blocks_code_blocks | scalar | NULL | NULL | |
| db_blocks_exclude | scalar | NULL | NULL | |
| db_blocks_filter | scalar | NULL | NULL | |
| db_blocks_headings | scalar | NULL | NULL | |
| db_blocks_links | scalar | NULL | NULL | |
| db_blocks_lint | scalar | NULL | NULL | |
| db_blocks_merge | scalar | NULL | NULL | |
| db_blocks_reorder | scalar | NULL | NULL | |
| db_blocks_slice | scalar | NULL | NULL | |
| db_blocks_stats | scalar | NULL | NULL | |
| db_blocks_structure | scalar | NULL | NULL | |
| db_blocks_to_text | scalar | NULL | NULL | |
| db_blocks_toc | scalar | NULL | NULL | |
| db_blocks_validate | scalar | NULL | NULL | |
| db_bold | scalar | NULL | NULL | |
| db_cite | scalar | NULL | NULL | |
| db_code | scalar | NULL | NULL | |
| db_concat | scalar | NULL | NULL | |
| db_document | scalar | NULL | NULL | |
| db_heading | scalar | NULL | NULL | |
| db_hr | scalar | NULL | NULL | |
| db_image | scalar | NULL | NULL | |
| db_inline_code | scalar | NULL | NULL | |
| db_inline_image | scalar | NULL | NULL | |
| db_inlines_to_pandoc | scalar | NULL | NULL | |
| db_italic | scalar | NULL | NULL | |
| db_linebreak | scalar | NULL | NULL | |
| db_link | scalar | NULL | NULL | |
| db_list_block | scalar | NULL | NULL | |
| db_list_item | scalar | NULL | NULL | |
| db_math | scalar | NULL | NULL | |
| db_metadata | scalar | NULL | NULL | |
| db_note | scalar | NULL | NULL | |
| db_paragraph | scalar | NULL | NULL | |
| db_quoted | scalar | NULL | NULL | |
| db_raw | scalar | NULL | NULL | |
| db_raw_inline | scalar | NULL | NULL | |
| db_rebase_levels | scalar | NULL | NULL | |
| db_section | scalar | NULL | NULL | |
| db_smallcaps | scalar | NULL | NULL | |
| db_softbreak | scalar | NULL | NULL | |
| db_space | scalar | NULL | NULL | |
| db_span | scalar | NULL | NULL | |
| db_strikethrough | scalar | NULL | NULL | |
| db_subscript | scalar | NULL | NULL | |
| db_superscript | scalar | NULL | NULL | |
| db_text | scalar | NULL | NULL | |
| db_underline | scalar | NULL | NULL | |
| duck_block | scalar | NULL | NULL | |
| duck_block_attr | scalar | NULL | NULL | |
| duck_block_content | scalar | NULL | NULL | |
| duck_block_encoding | scalar | NULL | NULL | |
| duck_block_level | scalar | NULL | NULL | |
| duck_block_order | scalar | NULL | NULL | |
| duck_block_set_content | scalar | NULL | NULL | |
| duck_block_set_level | scalar | NULL | NULL | |
| duck_block_set_order | scalar | NULL | NULL | |
| duck_block_type | scalar | NULL | NULL | |
| duck_block_valid | scalar | NULL | NULL | |
| pandoc_ast_to_blocks | scalar | NULL | NULL | |
| pandoc_blocks_to_ast | scalar | NULL | NULL | |
| pandoc_inlines_to_db_inlines | scalar | NULL | NULL | |
| pandoc_inlines_to_text | scalar | NULL | NULL | |
| to_duck_block | scalar | NULL | NULL |
Added Types
| type_name | type_size | logical_type | type_category | internal |
|---|---|---|---|---|
| duck_block | 0 | STRUCT | COMPOSITE | true |
| duck_block_ext | 0 | STRUCT | COMPOSITE | true |