Search Shortcut cmd + k | ctrl + k
rdf

A DuckDB extension to read and write RDF

Maintainer(s): nonodename

Installing and Loading

INSTALL rdf FROM community;
LOAD rdf;

Example

-- 0. Assuming the extension is already installed and loaded

-- 1. Get number of ntriples in a directory
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');

-- 2. Get subjects and predicates of a turtle file
SELECT subject, predicate FROM read_rdf('test/rdf/tests.ttl');

-- 3. Write a query to turtle RDF, using R2RML mapping
COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt'
(FORMAT r2rml, mapping 'mapping.ttl');

-- 4. Execute a full R2RML mapping (with embedded queries) to write RDF
COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

-- 5. Check if an R2RML mapping is valid
SELECT is_valid_r2rml('mapping.ttl');

About rdf

The duck_rdf extension enables DuckDB to read and write RDF (Resource Description Framework) data directly, using the SERD library for parsing and serialization.

Supported Formats

Read: Turtle (.ttl), NTriples (.nt), NQuads (.nq), TriG (.trig), and RDF/XML (.rdf/.xml, experimental — read-only).

Write: NTriples, Turtle, NQuads (via R2RML mapping).

Reading RDF

read_rdf() returns six columns: subject, predicate, object (always populated), and graph, language_tag, datatype (nullable). It accepts a file path or glob pattern; multiple matched files are scanned in parallel.

SELECT subject, predicate FROM read_rdf('data.ttl');
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');
SELECT * FROM read_rdf('data/*.dat', file_type = 'ttl', strict_parsing = false);

Optional parameters:

Parameter Default Description
strict_parsing true Set to false to allow malformed URIs
prefix_expansion false Expand CURIE-form URIs to full URIs (Turtle/TriG only)
file_type auto-detected Override format: ttl, nt, nq, trig, rdf/xml

The experimental read_sparql(endpoint, query) sends a SPARQL SELECT query to a remote endpoint and returns the result set as a DuckDB table. Column names are derived from the SPARQL variable names; all columns are VARCHAR. Unbound variables are returned as empty strings.

-- Count number of humans in wikidata
SELECT * FROM read_sparql(
            'https://query.wikidata.org/sparql',
            'SELECT (COUNT(*) AS ?count) WHERE {   ?item wdt:P31 wd:Q5 .}'
        );

Writing RDF (R2RML)

Write RDF using R2RML mapping files with DuckDB's COPY TO syntax. Two modes are supported:

Inside-out mode — DuckDB drives the query; the mapping has no rr:logicalTable:

COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Full R2RML mode — the mapping defines its own queries:

COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Write options:

Option Required Default Description
mapping Yes Path to R2RML mapping file (.ttl)
rdf_format No ntriples Output format: ntriples, turtle, or nquads
ignore_non_fatal_errors No true Raise an exception on the first parse error when false

Validation Helpers

SELECT is_valid_r2rml('mapping.ttl');      -- validate an R2RML mapping file
SELECT can_call_inside_out('mapping.ttl'); -- check if inside-out mode is supported

Added Functions

function_name function_type description comment examples
can_call_inside_out scalar NULL NULL  
is_valid_r2rml scalar NULL NULL  
read_rdf table NULL NULL  
read_sparql table NULL NULL  

Overloaded Functions

| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|

Added Types

| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|

Added Settings

| name | description | input_type | scope | aliases | |——|————-|————|——-|———|