Read HDF5 datasets and attributes
Maintainer(s):
jokasimr
Installing and Loading
INSTALL h5db FROM community;
LOAD h5db;
Example
FROM h5_read('file.h5', '/some/dataset', '/another');
About h5db
This extension provides functions for reading data and metadata from HDF5 files.
Features include:
h5_read()table function to read datasets.h5_tree()table function to list groups and datasets in a file, optionally including projected attributes.h5_attributes()table function to read attributes.h5_ls()table and scalar functions to list entries in groups.- Multiple datasets: Read multiple datasets into separate columns.
- Multi-dimensional arrays: Support for 1D to 4D datasets.
- Projection pushdown: Read only the datasets that are actually needed.
- Index column: Optionally adds an
indexcolumn that supports predicate pushdown of constant range filters (>,<=,BETWEEN, etc.) for efficient selective reads. - Reads datasets that are larger than memory.
- Remote reads over HTTPS, S3, etc., via the DuckDB
httpfsextension. - Remote reads over SFTP via a built-in SFTP client.
- Globbing: Combine datasets from multiple files vertically (UNION ALL).
For full documentation, see the h5db repository.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| h5_alias | scalar | Renames a column definition. | NULL | [FROM h5_read('data.h5', h5_alias('temperature', '/entry/temp'))] |
| h5_attr | scalar | Creates a projected HDF5 attribute definition for h5_tree() and h5_ls(). | NULL | [FROM h5_tree('data.h5', h5_attr('NX_class'))] |
| h5_attributes | table | Reads all attributes from an HDF5 object or file root. | NULL | [FROM h5_attributes('data.h5', '/measurements')] |
| h5_first_file | scalar | Returns the first concrete HDF5 filename from an exact path, glob, or list for planning-time use with h5_read(). | NULL | [FROM h5read(h5_first_file('runs/run*.h5'), '/detector_geometry')] |
| h5_index | scalar | Creates a virtual row-index column definition for h5_read(). | NULL | [FROM h5_read('data.h5', h5_index(), '/measurements')] |
| h5_ls | scalar | Lists entries immediately under an HDF5 group as a MAP. | NULL | [SELECT h5_ls('data.h5', '/entry')] |
| h5_ls | table | Lists entries immediately under an HDF5 group as rows. | NULL | [FROM h5_ls('data.h5', '/entry')] |
| h5_ls_swmr | scalar | Lists entries immediately under an HDF5 group as a MAP using SWMR read mode. | NULL | [SELECT h5_ls_swmr('data.h5', '/entry')] |
| h5_read | table | Reads one or more HDF5 datasets as DuckDB columns. | NULL | [FROM h5_read('data.h5', '/measurements')] |
| h5_rse | scalar | Creates a run-start encoded column definition for h5_read(). | NULL | [FROM h5_read('data.h5', '/time', h5_rse('/state_run_starts', '/state_values'))] |
| h5_tree | table | Recursively lists entries in an HDF5 file. | NULL | [FROM h5_tree('data.h5')] |
| h5db_version | scalar | Returns the linked HDF5 library version used by h5db. | NULL | [SELECT h5db_version('h5db')] |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| h5db_batch_size | Target batch size for h5_read chunk caching (e.g. 1MB, 8MB) | VARCHAR | GLOBAL | [] |
| h5db_swmr_default | Default to SWMR read mode for h5db table functions | BOOLEAN | GLOBAL | [] |