Intelligent query result caching between DuckDB and Snowflake with TTL and smart refresh
Maintainer(s):
danjsiegel
Installing and Loading
INSTALL ducksync FROM community;
LOAD ducksync;
Example
-- Install and load the extension
INSTALL ducksync FROM community;
LOAD ducksync;
-- Setup storage (requires PostgreSQL for DuckLake catalog)
SELECT * FROM ducksync_setup_storage(
'host=localhost port=5432 dbname=ducklake user=postgres password=secret',
'/data/ducksync'
);
-- Register Snowflake source (requires existing DuckDB secret)
SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');
-- Create a cache with 1-hour TTL
SELECT * FROM ducksync_create_cache(
'customers_cache',
'prod',
'SELECT * FROM PROD.PUBLIC.CUSTOMERS',
['PROD.PUBLIC.CUSTOMERS'],
3600
);
-- Refresh the cache
SELECT * FROM ducksync_refresh('customers_cache');
-- Query via smart routing (cache hit or passthrough)
SELECT * FROM ducksync_query(
'SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''',
'prod'
);
-- Or query the DuckLake table directly
SELECT * FROM ducksync.prod.customers_cache;
About ducksync
DuckSync provides intelligent query result caching between DuckDB and Snowflake. It uses DuckLake for storage (PostgreSQL catalog + Parquet files) and features transparent query routing, TTL-based expiration, and smart refresh based on source table metadata.
Features:
- Smart Query Routing:
ducksync_query()automatically routes to cache or Snowflake - Named Queries: Cache complex queries under friendly names
- TTL Support: Configurable cache expiration with automatic refresh
- Smart Refresh: Only refreshes when source tables have changed (checks
last_altered) - Direct Access: Query cached data as standard DuckLake tables
- AST-Based Rewriting: Safe query transformation that only modifies table references
Prerequisites:
- PostgreSQL database (for DuckLake catalog)
- Snowflake account with a configured DuckDB secret
- ADBC Snowflake driver (see duckdb-snowflake setup)
Query Routing Logic:
- Parses SQL using DuckDB's parser to extract table references
- Checks if each table is cached (by cache name or monitored table)
- If ALL tables cached → rewrites query to use local DuckLake tables
- If ANY table not cached → passes entire query to Snowflake
For complete documentation, visit the extension repository.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| ducksync_init | table | Initialize DuckSync with an existing DuckLake catalog. Use this if you already have DuckLake attached. | Recommended approach for existing DuckLake users | [SELECT * FROM ducksync_init('my_ducklake');] |
| ducksync_setup_storage | table | Full setup - attaches DuckLake and initializes DuckSync. Use if you don't have DuckLake configured yet. | Creates DuckLake catalog with PostgreSQL backend | [SELECT * FROM ducksync_setup_storage('host=localhost dbname=ducklake user=postgres', '/data/ducksync');] |
| ducksync_add_source | table | Register a Snowflake data source using an existing DuckDB secret. | Currently only supports Snowflake | [SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');] |
| ducksync_create_cache | table | Define a cached query result with optional TTL and monitor tables for smart refresh. | monitor_tables triggers refresh when source changes | [SELECT * FROM ducksync_create_cache('customers', 'prod', 'SELECT * FROM CUSTOMERS', ['PROD.PUBLIC.CUSTOMERS'], 3600);] |
| ducksync_refresh | table | Refresh a cache. Uses smart check (skips if unchanged) unless force=true. | Returns SKIPPED, REFRESHED, or ERROR | [SELECT * FROM ducksync_refresh('customers');] |
| ducksync_query | table | Smart query routing - returns data from cache if available, otherwise passes through to Snowflake. Checks TTL and auto-refreshes expired caches. | Main query interface | [SELECT * FROM ducksync_query('SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''', 'prod');] |