Search Shortcut cmd + k | ctrl + k
h5db

Read HDF5 datasets and attributes

Maintainer(s): jokasimr

Installing and Loading

INSTALL h5db FROM community;
LOAD h5db;

Example

FROM h5_read('file.h5', '/some/dataset', '/another');

About h5db

This extension provides functions for reading data and metadata from HDF5 files.

Features include:

  • h5_read() table function to read datasets.
  • h5_tree() table function to list groups and datasets in a file, optionally including projected attributes.
  • h5_attributes() table function to read attributes.
  • h5_ls() table and scalar functions to list entries in groups.
  • Multiple datasets: Read multiple datasets into separate columns.
  • Multi-dimensional arrays: Support for 1D to 4D datasets.
  • Projection pushdown: Read only the datasets that are actually needed.
  • Index column: Optionally adds an index column that supports predicate pushdown of constant range filters (>, <=, BETWEEN, etc.) for efficient selective reads.
  • Reads datasets that are larger than memory.
  • Remote reads over HTTPS, S3, etc., via the DuckDB httpfs extension.
  • Remote reads over SFTP via a built-in SFTP client.
  • Globbing: Combine datasets from multiple files vertically (UNION ALL).

For full documentation, see the h5db repository.

Added Functions

function_name function_type description comment examples
h5_alias scalar Renames a column definition. NULL [FROM h5_read('data.h5', h5_alias('temperature', '/entry/temp'))]
h5_attr scalar Creates a projected HDF5 attribute definition for h5_tree() and h5_ls(). NULL [FROM h5_tree('data.h5', h5_attr('NX_class'))]
h5_attributes table Reads all attributes from an HDF5 object or file root. NULL [FROM h5_attributes('data.h5', '/measurements')]
h5_first_file scalar Returns the first concrete HDF5 filename from an exact path, glob, or list for planning-time use with h5_read(). NULL [FROM h5read(h5_first_file('runs/run*.h5'), '/detector_geometry')]
h5_index scalar Creates a virtual row-index column definition for h5_read(). NULL [FROM h5_read('data.h5', h5_index(), '/measurements')]
h5_ls scalar Lists entries immediately under an HDF5 group as a MAP. NULL [SELECT h5_ls('data.h5', '/entry')]
h5_ls table Lists entries immediately under an HDF5 group as rows. NULL [FROM h5_ls('data.h5', '/entry')]
h5_ls_swmr scalar Lists entries immediately under an HDF5 group as a MAP using SWMR read mode. NULL [SELECT h5_ls_swmr('data.h5', '/entry')]
h5_read table Reads one or more HDF5 datasets as DuckDB columns. NULL [FROM h5_read('data.h5', '/measurements')]
h5_rse scalar Creates a run-start encoded column definition for h5_read(). NULL [FROM h5_read('data.h5', '/time', h5_rse('/state_run_starts', '/state_values'))]
h5_tree table Recursively lists entries in an HDF5 file. NULL [FROM h5_tree('data.h5')]
h5db_version scalar Returns the linked HDF5 library version used by h5db. NULL [SELECT h5db_version('h5db')]

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

name description input_type scope aliases
h5db_batch_size Target batch size for h5_read chunk caching (e.g. 1MB, 8MB) VARCHAR GLOBAL []
h5db_swmr_default Default to SWMR read mode for h5db table functions BOOLEAN GLOBAL []