Prewarm data blocks into DuckDB's buffer pool or OS page cache for faster queries
Installing and Loading
INSTALL cache_prewarm FROM community;
LOAD cache_prewarm;
Example
-- Prewarm a table into the buffer pool
SELECT prewarm('table_name');
-- With explicit mode
SELECT prewarm('table_name', 'buffer');
-- With size limit (human-readable format)
SELECT prewarm('table_name', 'buffer', '1GB');
-- With qualified table name
SELECT prewarm('my_schema.table_name');
-- Prewarm a remote file into the on-disk cache
SELECT prewarm_remote('https://example.com/data/file.parquet');
-- Prewarm with a glob pattern (multiple files)
SELECT prewarm_remote('https://example.com/data/*.parquet');
About cache_prewarm
A DuckDB extension that preloads table data blocks into the buffer pool or OS page cache, inspired by PostgreSQL's pg_prewarm extension.
Prewarm Modes:
buffer(default): Load blocks into buffer pool with pin/unpinread: Warm OS page cache without using buffer poolprefetch: Batch prefetch using OS hints
Remote Prewarm:
prewarm_remotepreloads remote files into the on-disk cache, requirescache_httpfsto be configured.- Supports glob patterns and size limits.
Use cases include cold start optimization and predictable query latency.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| cache_httpfs_add_exclusion_regex | scalar | NULL | NULL | |
| cache_httpfs_cache_access_info_query | table | NULL | NULL | |
| cache_httpfs_cache_status_query | table | NULL | NULL | |
| cache_httpfs_cleanup_dead_temp | scalar | NULL | NULL | |
| cache_httpfs_clear_cache | scalar | NULL | NULL | |
| cache_httpfs_clear_cache_for_file | scalar | NULL | NULL | |
| cache_httpfs_clear_profile | scalar | NULL | NULL | |
| cache_httpfs_get_cache_config | table | NULL | NULL | |
| cache_httpfs_get_cache_filesystems | table | NULL | NULL | |
| cache_httpfs_get_cache_type | table | NULL | NULL | |
| cache_httpfs_get_data_cache_config | table | NULL | NULL | |
| cache_httpfs_get_file_handle_cache_config | table | NULL | NULL | |
| cache_httpfs_get_glob_cache_config | table | NULL | NULL | |
| cache_httpfs_get_metadata_cache_config | table | NULL | NULL | |
| cache_httpfs_get_ondisk_data_cache_size | scalar | NULL | NULL | |
| cache_httpfs_get_profile | scalar | NULL | NULL | |
| cache_httpfs_list_exclusion_regex | table | NULL | NULL | |
| cache_httpfs_list_registered_filesystems | table | NULL | NULL | |
| cache_httpfs_reset_exclusion_regex | scalar | NULL | NULL | |
| cache_httpfs_wrap_cache_filesystem | scalar | NULL | NULL | |
| prewarm | scalar | NULL | NULL | |
| prewarm_remote | scalar | NULL | NULL |
Overloaded Functions
| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|
Added Types
| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| cache_httpfs_cache_block_size | Block size for cache, applies to both in-memory cache filesystem and on-disk cache filesystem. It's worth noting for on-disk filesystem, all existing cache files are invalidated after config update. | UBIGINT | GLOBAL | [] |
| cache_httpfs_cache_directories_config | Advanced configuration for on-disk cache. It supports multiple directories, separated by semicolons (';'). Cache blocks will be evenly distributed under different directories deterministically.Between different runs, it's expected to provide same cache directories, otherwise it's not guaranteed cache files still exist and accessible.Overrides 'cache_httpfs_cache_directory' if set. | VARCHAR | GLOBAL | [] |
| cache_httpfs_cache_directory | The disk cache directory that stores cached data | VARCHAR | GLOBAL | [] |
| cache_httpfs_clear_cache_on_write | Whether to clear cache entries on write operations. When enabled, write operations will invalidate cached metadata, file handles, and glob entries for the modified file, which could be expensive.Disabling this can improve write performance when many cache entries exist, but may lead to stale cache reads. By default disabled. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_disk_cache_reader_enable_memory_cache | Whether enable process-wise read-through/write-through cache for disk cache reader. When enabled, local cache file will be accessed with direct IO. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_disk_cache_reader_mem_cache_block_count | Max number of cache blocks for the read-through/write-through cache for disk cache reader. | UBIGINT | GLOBAL | [] |
| cache_httpfs_disk_cache_reader_mem_cache_timeout_millisec | Timeout in milliseconds for the read-through/write-through cache for disk cache reader. | UBIGINT | GLOBAL | [] |
| cache_httpfs_enable_cache_validation | Whether to enable cache validation using version tag and last modification timestamp. When enabled, cache entries are validated against the current file version tag and modification timestamp to ensure cache consistency. By default disabled. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_enable_file_handle_cache | Whether file handle cache is enable for cache filesystem. By default enabled. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_enable_glob_cache | Whether glob cache is enable for cache filesystem. By default enabled. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_enable_metadata_cache | Whether metadata cache is enable for cache filesystem. By default enabled. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_evict_policy | Eviction policy for on-disk cache cache blocks. By default it's creation timestamp based ('creation_timestamp'), which deletes all cache blocks created earlier than threshold. Other supported policy include 'lru_single_proc' (LRU forsingle process access), which performs LRU-based eviction, mainly made single processusage. | VARCHAR | GLOBAL | [] |
| cache_httpfs_file_handle_cache_entry_size | Max cache size for file handle cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_file_handle_cache_entry_timeout_millisec | Cache entry timeout in milliseconds for file handle cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_glob_cache_entry_size | Max cache size for glob cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_glob_cache_entry_timeout_millisec | Cache entry timeout in milliseconds for glob cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_ignore_sigpipe | Whether to ignore SIGPIPE for the extension. By default not ignored. Once ignored, it cannot be reverted. | BOOLEAN | GLOBAL | [] |
| cache_httpfs_in_mem_cache_block_timeout_millisec | Data block cache entry timeout in milliseconds. | UBIGINT | GLOBAL | [] |
| cache_httpfs_max_fanout_subrequest | Cached httpfs performs parallel request by splittng them into small request, with request size decided by config [cache_httpfs_cache_block_size]. The setting limits the maximum request to issue for a single filesystem read request. 0 means no limit, by default we set no limit. | BIGINT | GLOBAL | [] |
| cache_httpfs_max_in_mem_cache_block_count | Max in-memory cache block count for in-memory caches for all cache filesystems, so users are able to configure the maximum memory consumption. It's worth noting it should be set only once before all filesystem access, otherwise there's no affect. | UBIGINT | GLOBAL | [] |
| cache_httpfs_metadata_cache_entry_size | Max cache size for metadata LRU cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_metadata_cache_entry_timeout_millisec | Cache entry timeout in milliseconds for metadata LRU cache. | UBIGINT | GLOBAL | [] |
| cache_httpfs_min_disk_bytes_for_cache | Min number of bytes on disk for the cache filesystem to enable on-disk cache; if left bytes is less than the threshold, LRU based cache file eviction will be performed.By default, 5% disk space will be reserved for other usage. When min disk bytes specified with a positive value, the default value will be overriden. | UBIGINT | GLOBAL | [] |
| cache_httpfs_profile_type | Profiling type for cached filesystem. There're three options available: noop, temp, and duckdb. temp option stores the latest IO operation profiling result, which potentially suffers concurrent updates; duckdb stores the IO operation profiling results into duckdb table, which unblocks advanced analysis. |
VARCHAR | GLOBAL | [] |
| cache_httpfs_type | Type for cached filesystem. Currently there're two types available, one is in_mem, another is on_disk. By default we use on-disk cache. Set to noop to disable, which behaves exactly same as httpfs extension. |
VARCHAR | GLOBAL | [] |