⌘+k ctrl+k
1.2 (stable)
Search Shortcut cmd + k | ctrl + k
Iceberg Extension

The iceberg extension implements support for the Apache Iceberg open table format.

Installing and Loading

To install the iceberg extension, run:

INSTALL iceberg;

Note that the iceberg extension is not autoloadable. Therefore, you need to load it before using it:

LOAD iceberg;

Updating the Extension

The iceberg extension often receives updates between DuckDB releases. To make sure that you have the latest version, update your extensions:

UPDATE EXTENSIONS;

Usage

To test the examples, download the iceberg_data.zip file and unzip it.

Common Parameters

Parameter Type Default Description
allow_moved_paths BOOLEAN false Allows scanning Iceberg tables that are moved
metadata_compression_codec VARCHAR '' Treats metadata files as when set to 'gzip'
version VARCHAR '?' Provides an explicit version string, hint file or guessing
version_name_format VARCHAR 'v%s%s.metadata.json,%s%s.metadata.json' Controls how versions are converted to metadata file names

Querying Individual Tables

SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
count_star()
51793

The allow_moved_paths option ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved.

You can also address specify the current manifest directly in the query, this may be resolved from the catalog prior to the query, in this example the manifest version is a UUID. To do so, navigate to the data/iceberg directory and run:

SELECT count(*)
FROM iceberg_scan('lineitem_iceberg/metadata/v1.metadata.json');
count_star()
60175

The iceberg works together with the httpfs extension to access Iceberg tables in object stores such as S3.

SELECT count(*)
FROM iceberg_scan('s3://bucketname/lineitem_iceberg/metadata/v1.metadata.json');

Access Iceberg Metadata

To access Iceberg Metadata, you can use the iceberg_metadata function:

SELECT *
FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true);
manifest_path manifest_sequence_number manifest_content status content file_path file_format record_count
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m1.avro 2 DATA ADDED EXISTING lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet PARQUET 51793
lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m0.avro 2 DATA DELETED EXISTING lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet PARQUET 60175

Visualizing Snapshots

To visualize the snapshots in an Iceberg table, use the iceberg_snapshots function:

SELECT *
FROM iceberg_snapshots('data/iceberg/lineitem_iceberg');
sequence_number snapshot_id timestamp_ms manifest_list
1 3776207205136740581 2023-02-15 15:07:54.504 lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro
2 7635660646343998149 2023-02-15 15:08:14.73 lineitem_iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro

Selecting Metadata Versions

By default, the iceberg extension will look for a version-hint.text file to identify the proper metadata version to use. This can be overridden by explicitly supplying a version number via the version parameter to the functions of the iceberg extension:

SELECT *
FROM iceberg_snapshots(
    'data/iceberg/lineitem_iceberg',
    version = '1',
    allow_moved_paths = true
);

By default, iceberg functions will look for both v{version}.metadata.json and {version}.metadata.json files, or v{version}.gz.metadata.json and {version}.gz.metadata.json when metadata_compression_codec = 'gzip' is specified. Other compression codecs are not supported.

If any text file is provided through the version parameter, it is opened and treated as a version hint file:

SELECT *
FROM iceberg_snapshots(
    'data/iceberg/lineitem_iceberg',
    version = 'version-hint.txt',
    allow_moved_paths = true
);

The iceberg extension will open this file and use the entire content of the file as a provided version number. Note that the entire content of the version-hint.txt file will be treated as a literal version name, with no encoding, escaping or trimming. This includes any whitespace, or unsafe characters which will be explicitly passed formatted into filenames in the logic described below.

Working with Alternative Metadata Naming Conventions

The iceberg extension can handle different metadata naming conventions by specifying them as a comma-delimited list of format strings via the version_name_format parameter. Each format string must contain two %s parameters. The first is the location of the version number in the metadata filename and the second is the location of the filename extension specified by the metadata_compression_codec. The behavior described above is provided by the default value of "v%s%s.metadata.gz,%s%smetadata.gz. If you had an alternatively named metadata file, e.g., rev-2.metadata.json.gz, the table can be read via the follow statement:

SELECT *
FROM iceberg_snapshots(
    'data/iceberg/alternative_metadata_gz_naming',
    version = '2',
    version_name_format = 'rev-%s.metadata.json%s',
    metadata_compression_codec = 'gzip',
    allow_moved_paths = true
);

“Guessing” Metadata Versions

By default, either a table version number or a version-hint.text must be provided for the iceberg extension to read a table. This is typically provided by an external data catalog. In the event neither is present, the iceberg extension can attempt to guess the latest version by passing ? as the version parameter:

SELECT count(*)
FROM iceberg_scan(
    'data/iceberg/lineitem_iceberg_no_hint',
    version = '?',
    allow_moved_paths = true
);

The “latest” version is assumed to be the filename that is lexicographically largest when sorting the filenames. Collations are not considered. This behavior is not enabled by default as it may potentially violate ACID constraints. It can be enabled by setting unsafe_enable_version_guessing to true. When this is set, iceberg functions will attempt to guess the latest version by default before failing.

SET unsafe_enable_version_guessing = true;
SELECT count(*)
FROM iceberg_scan('data/iceberg/lineitem_iceberg_no_hint', allow_moved_paths = true);

Limitations

Writing (i.e., exporting to) Iceberg files is currently not supported.

Pages in This Section