Intelligent query result caching between DuckDB and Snowflake with TTL and smart refresh
Maintainer(s):
danjsiegel
Installing and Loading
INSTALL ducksync FROM community;
LOAD ducksync;
Example
-- Install and load the extension
INSTALL ducksync FROM community;
LOAD ducksync;
-- Option A: Use an existing DuckLake catalog
ATTACH 'ducklake:postgres:host=localhost dbname=ducklake user=postgres password=secret' AS my_lake (DATA_PATH '/data');
SELECT * FROM ducksync_init('my_lake');
-- Option B: Full setup (attaches DuckLake automatically)
SELECT * FROM ducksync_setup_storage(
'host=localhost port=5432 dbname=ducklake user=postgres password=secret',
'/data/ducksync'
);
-- Register Snowflake source (requires existing DuckDB secret)
SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');
-- Create a cache with 1-hour TTL
SELECT * FROM ducksync_create_cache(
'customers_cache',
'prod',
'SELECT * FROM PROD.PUBLIC.CUSTOMERS',
['PROD.PUBLIC.CUSTOMERS'],
3600
);
-- Query via smart routing (cache hit or passthrough)
SELECT * FROM ducksync_refresh('customers_cache');
-- Query via smart routing: cache hit → DuckLake, miss → Snowflake passthrough
SELECT * FROM ducksync_query(
'SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''',
'prod'
);
-- Or query the DuckLake table directly
SELECT * FROM ducksync.prod.customers_cache;
About ducksync
DuckSync provides intelligent query result caching between DuckDB and Snowflake. It uses DuckLake for storage (PostgreSQL catalog + Parquet files) and features transparent query routing, TTL-based expiration, and smart refresh based on source table metadata.
Core workflow:
- Register a Snowflake source with
ducksync_add_source - Define a cache with
ducksync_create_cache(SQL query + monitor tables + TTL) - Populate with
ducksync_refresh(fetches from Snowflake, stores in DuckLake) - Query with
ducksync_query— automatically routes to cache or Snowflake
Features:
- Smart Query Routing:
ducksync_query()automatically routes to cache or Snowflake - Named Queries: Cache complex queries under friendly names
- TTL Support: Configurable cache expiration with automatic refresh
- Smart Refresh: Only refreshes when source tables have changed (checks
last_altered) - Direct Access: Query cached data as standard DuckLake tables
- AST-Based Rewriting: Safe query transformation that only modifies table references
- Configurable Metadata Schema: Pass an optional
schema_nametoducksync_initorducksync_setup_storagefor multi-tenant environments
Prerequisites:
- PostgreSQL database (for DuckLake catalog)
- Snowflake account with a configured DuckDB secret
- ADBC Snowflake driver (see duckdb-snowflake setup)
Query Routing Logic:
- Parses SQL using DuckDB's parser to extract table references
- Checks if each table is cached (by cache name or monitored table)
- If ALL tables cached → rewrites query to use local DuckLake tables
- If ANY table not cached → passes entire query to Snowflake
For complete documentation, visit the extension repository.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| ducksync_init | table | Initialize DuckSync with an existing DuckLake catalog. Use when you already have DuckLake attached. | Recommended for existing DuckLake users. Optional second arg schema_name (default 'ducksync') for multi-tenant environments. | [SELECT * FROM ducksync_init('my_lake');] |
| ducksync_init | table | Initialize DuckSync with an existing DuckLake catalog and a custom metadata schema name. | Use a custom schema_name to isolate DuckSync metadata in shared DuckLake environments (e.g. GizmoSQL). | [SELECT * FROM ducksync_init('my_lake', 'my_ducksync_schema');] |
| ducksync_setup_storage | table | Full setup — attaches DuckLake with a PostgreSQL catalog and initializes DuckSync. | Use when you don't have DuckLake configured yet. Optional third arg schema_name (default 'ducksync'). | [SELECT * FROM ducksync_setup_storage('host=localhost dbname=ducklake user=postgres', '/data/ducksync');] |
| ducksync_setup_storage | table | Full setup with a custom metadata schema name for multi-tenant environments. | Data tables always live in catalog.source_name.cache_name regardless of schema_name. | [SELECT * FROM ducksync_setup_storage('host=localhost dbname=ducklake user=postgres', '/data/ducksync', 'my_schema');] |
| ducksync_add_source | table | Register a Snowflake data source using an existing DuckDB secret. | Currently supports Snowflake. Optional passthrough_enabled=true to allow direct Snowflake passthrough. | [SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');] |
| ducksync_create_cache | table | Define a cached query result with a source query, monitor tables for smart refresh, and optional TTL in seconds. | monitor_tables are checked for last_altered changes to determine if refresh is needed. TTL=0 means no expiry. | [SELECT * FROM ducksync_create_cache('customers', 'prod', 'SELECT * FROM PROD.PUBLIC.CUSTOMERS', ['PROD.PUBLIC.CUSTOMERS'], 3600);] |
| ducksync_refresh | table | Refresh a cache by fetching from Snowflake and storing in DuckLake. Skips if source data unchanged (smart refresh). | Returns result=REFRESHED, SKIPPED, or ERROR with rows_refreshed and duration_ms. Use force=true to bypass smart check. | [SELECT * FROM ducksync_refresh('customers');] |
| ducksync_query | table | Smart query routing — rewrites SQL to use cached DuckLake tables when all referenced tables are cached; passes through to Snowflake otherwise. Returns actual query results. | Main query interface. Automatically refreshes expired caches before executing. Supports SELECT with JOINs, UNIONs, subqueries. | [SELECT * FROM ducksync_query('SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''', 'prod');] |