Search Shortcut cmd + k | ctrl + k
cache_httpfs

Read cached filesystem for httpfs

Maintainer(s): dentiny, DouEnergy

Installing and Loading

INSTALL cache_httpfs FROM community;
LOAD cache_httpfs;

Example

SELECT cache_httpfs_get_cache_size();

About cache_httpfs

This extension adds a read cache filesystem to DuckDB, which acts as a wrapper of httpfs extention. It supports a few key features:

  • Supports both metadata cache and data block cache
  • Supports both on-disk cache and in-memory cache, with block size and cache mode tunable
  • Supports disk cache file eviction based on access timestamp
  • Supports parallel IO request, with request size tunable
  • Supports profiling for IO latency and cache hit / miss ratio, which provides an insight on workload characterization
  • Exposes function to get cache size and cleanup cache
  • Provides an option to disable / enable cache, which could act as a drop-in replacement for httpfs

Added Functions

function_name function_type description comment examples
cache_httpfs_clear_cache scalar NULL NULL []
cache_httpfs_clear_cache_for_file scalar NULL NULL []
cache_httpfs_clear_profile scalar NULL NULL []
cache_httpfs_get_cache_size scalar NULL NULL []
cache_httpfs_get_profile scalar NULL NULL []

Added Settings

name description input_type scope
ca_cert_file Path to a custom certificate file for self-signed certificates. VARCHAR GLOBAL
cache_httpfs_cache_block_size Block size for cache, applies to both in-memory cache filesystem and on-disk cache filesystem. It's worth noting for on-disk filesystem, all existing cache files are invalidated after config update. UBIGINT GLOBAL
cache_httpfs_cache_directory The disk cache directory that stores cached data VARCHAR GLOBAL
cache_httpfs_enable_metadata_cache Whether metadata cache is enable for cache filesystem. By default enabled. BOOLEAN GLOBAL
cache_httpfs_ignore_sigpipe Whether to ignore SIGPIPE for the extension. By default not ignored. Once ignored, it cannot be reverted. BOOLEAN GLOBAL
cache_httpfs_max_fanout_subrequest Cached httpfs performs parallel request by splittng them into small request, with request size decided by config [cache_httpfs_cache_block_size]. The setting limits the maximum request to issue for a single filesystem read request. 0 means no limit, by default we set no limit. BIGINT GLOBAL
cache_httpfs_max_in_mem_cache_block_count Max in-memory cache block count for in-memory cache filesystem. It's worth noting it should be set only once before all filesystem access, otherwise there's no affect. UBIGINT GLOBAL
cache_httpfs_profile_type Profiling type for cached filesystem. There're three options available: noop, temp, and duckdb. temp option stores the latest IO operation profiling result, which potentially suffers concurrent updates; duckdb stores the IO operation profiling results into duckdb table, which unblocks advanced analysis. VARCHAR GLOBAL
cache_httpfs_type Type for cached filesystem. Currently there're two types available, one is in_mem, another is on_disk. By default we use on-disk cache. Set to noop to disable, which behaves exactly same as httpfs extension. VARCHAR GLOBAL
enable_server_cert_verification Enable server side certificate verification. BOOLEAN GLOBAL
force_download Forces upfront download of file BOOLEAN GLOBAL
hf_max_per_page Debug option to limit number of items returned in list requests UBIGINT GLOBAL
http_keep_alive Keep alive connections. Setting this to false can help when running into connection failures BOOLEAN GLOBAL
http_retries HTTP retries on I/O error UBIGINT GLOBAL
http_retry_backoff Backoff factor for exponentially increasing retry wait time FLOAT GLOBAL
http_retry_wait_ms Time between retries UBIGINT GLOBAL
http_timeout HTTP timeout read/write/connection/retry (in seconds) UBIGINT GLOBAL
s3_access_key_id S3 Access Key ID VARCHAR GLOBAL
s3_endpoint S3 Endpoint VARCHAR GLOBAL
s3_region S3 Region VARCHAR GLOBAL
s3_secret_access_key S3 Access Key VARCHAR GLOBAL
s3_session_token S3 Session Token VARCHAR GLOBAL
s3_uploader_max_filesize S3 Uploader max filesize (between 50GB and 5TB) VARCHAR GLOBAL
s3_uploader_max_parts_per_file S3 Uploader max parts per file (between 1 and 10000) UBIGINT GLOBAL
s3_uploader_thread_limit S3 Uploader global thread limit UBIGINT GLOBAL
s3_url_compatibility_mode Disable Globs and Query Parameters on S3 URLs BOOLEAN GLOBAL
s3_url_style S3 URL style VARCHAR GLOBAL
s3_use_ssl S3 use SSL BOOLEAN GLOBAL