DuckLake has been released in May 2025. Read the announcement blog post.
The ducklake
extension add support for attaching to databases stored in the DuckLake format:
Installing and Loading
To install ducklake
, run:
INSTALL ducklake;
The ducklake
extension will be transparently autoloaded on first use in an ATTACH
clause.
If you would like to load it manually, run:
LOAD ducklake;
Usage
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'data_files');
USE my_ducklake;
Tables
In DuckDB, the ducklake
extension stores the catalog tables for a DuckLake named my_ducklake
in the __ducklake_metadata_my_ducklake
catalog.
Functions
Note that DuckLake registers several functions. These should be called with the catalog name as the first argument, e.g.:
FROM ducklake_snapshots('my_ducklake');
┌─────────────┬────────────────────────────┬────────────────┬──────────────────────────┐
│ snapshot_id │ snapshot_time │ schema_version │ changes │
│ int64 │ timestamp with time zone │ int64 │ map(varchar, varchar[]) │
├─────────────┼────────────────────────────┼────────────────┼──────────────────────────┤
│ 0 │ 2025-05-26 11:41:10.838+02 │ 0 │ {schemas_created=[main]} │
└─────────────┴────────────────────────────┴────────────────┴──────────────────────────┘
ducklake_snapshots
Returns the snapshots stored in the DuckLake catalog name catalog
.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no |
The information is encoded into a table with the following schema:
Column name | Column type |
---|---|
snapshot_id |
BIGINT |
snapshot_time |
TIMESTAMP WITH TIME ZONE |
schema_version |
BIGINT |
changes |
MAP(VARCHAR, VARCHAR[]) |
ducklake_table_info
The ducklake_table_info
function returns information on the tables stored in the DuckLake catalog named catalog
.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no |
The information is encoded into a table with the following schema:
Column name | Column type |
---|---|
table_name |
VARCHAR |
schema_id |
BIGINT |
table_id |
BIGINT |
table_uuid |
UUID |
file_count |
BIGINT |
file_size_bytes |
BIGINT |
delete_file_count |
BIGINT |
delete_file_size_bytes |
BIGINT |
ducklake_table_insertions
The ducklake_table_insertions
function returns the rows inserted in a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether start_snapshot
and end_snapshot
have types BIGINT
or TIMESTAMP WITH TIME ZONE
.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no | |
schema_name |
VARCHAR |
no | |
table_name |
VARCHAR |
no | |
start_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no | |
end_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no |
The schema of the table returned by the function is equivalent to that of the table table_name
.
ducklake_table_deletions
The ducklake_table_deletions
function returns the rows deleted from a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether start_snapshot
and end_snapshot
have types BIGINT
or TIMESTAMP WITH TIME ZONE
.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no | |
schema_name |
VARCHAR |
no | |
table_name |
VARCHAR |
no | |
start_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no | |
end_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no |
The schema of the table returned by the function is equivalent to that of the table table_name
.
ducklake_table_changes
The ducklake_table_changes
function returns the rows changed in a given table between snapshots of given versions or timestamps.
The function has two variants, depending on whether start_snapshot
and end_snapshot
have types BIGINT
or TIMESTAMP WITH TIME ZONE
.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no | |
schema_name |
VARCHAR |
no | |
table_name |
VARCHAR |
no | |
start_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no | |
end_snapshot |
BIGINT / TIMESTAMP WITH TIME ZONE |
no |
The schema of the table returned by the function contains the following three columns plus the schema of the table table_name
.
Column name | Column type | Description |
---|---|---|
snapshot_id |
BIGINT |
|
rowid |
BIGINT |
|
change_type |
VARCHAR |
The type of change: insert or delete |
Commands
ducklake_cleanup_old_files
The ducklake_cleanup_old_files
function cleans up old files in the DuckLake denoted by catalog
.
Upon success, it returns a table with a single column (Success
) and 0 rows.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no | |
cleanup_all |
BOOLEAN |
yes | |
dry_run |
BOOLEAN |
yes | |
older_than |
TIMESTAMP WITH TIME ZONE |
yes |
ducklake_expire_snapshots
The ducklake_expire_snapshots
function expires snapshots with the versions specified by the versions
parameter or the ones older than the older_than
parameter.
Upon success, it returns a table with a single column (Success
) and 0 rows.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no | |
versions |
UBIGINT[] |
yes | |
older_than |
TIMESTAMP WITH TIME ZONE |
yes |
ducklake_merge_adjacent_files
The ducklake_merge_adjacent_files
function merges adjacent files in the storage.
Upon success, it returns a table with a single column (Success
) and 0 rows.
Parameter name | Parameter type | Named parameter | Description |
---|---|---|---|
catalog |
VARCHAR |
no |