Search Shortcut cmd + k | ctrl + k
ducksync

Intelligent query result caching between DuckDB and Snowflake with TTL and smart refresh

Maintainer(s): danjsiegel

Installing and Loading

INSTALL ducksync FROM community;
LOAD ducksync;

Example

-- Install and load the extension
INSTALL ducksync FROM community;
LOAD ducksync;

-- Setup storage (requires PostgreSQL for DuckLake catalog)
SELECT * FROM ducksync_setup_storage(
    'host=localhost port=5432 dbname=ducklake user=postgres password=secret',
    '/data/ducksync'
);

-- Register Snowflake source (requires existing DuckDB secret)
SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');

-- Create a cache with 1-hour TTL
SELECT * FROM ducksync_create_cache(
    'customers_cache',
    'prod',
    'SELECT * FROM PROD.PUBLIC.CUSTOMERS',
    ['PROD.PUBLIC.CUSTOMERS'],
    3600
);

-- Refresh the cache
SELECT * FROM ducksync_refresh('customers_cache');

-- Query via smart routing (cache hit or passthrough)
SELECT * FROM ducksync_query(
    'SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''',
    'prod'
);

-- Or query the DuckLake table directly
SELECT * FROM ducksync.prod.customers_cache;

About ducksync

DuckSync provides intelligent query result caching between DuckDB and Snowflake. It uses DuckLake for storage (PostgreSQL catalog + Parquet files) and features transparent query routing, TTL-based expiration, and smart refresh based on source table metadata.

Features:

  • Smart Query Routing: ducksync_query() automatically routes to cache or Snowflake
  • Named Queries: Cache complex queries under friendly names
  • TTL Support: Configurable cache expiration with automatic refresh
  • Smart Refresh: Only refreshes when source tables have changed (checks last_altered)
  • Direct Access: Query cached data as standard DuckLake tables
  • AST-Based Rewriting: Safe query transformation that only modifies table references

Prerequisites:

  • PostgreSQL database (for DuckLake catalog)
  • Snowflake account with a configured DuckDB secret
  • ADBC Snowflake driver (see duckdb-snowflake setup)

Query Routing Logic:

  1. Parses SQL using DuckDB's parser to extract table references
  2. Checks if each table is cached (by cache name or monitored table)
  3. If ALL tables cached → rewrites query to use local DuckLake tables
  4. If ANY table not cached → passes entire query to Snowflake

For complete documentation, visit the extension repository.

Added Functions

function_name function_type description comment examples
ducksync_init table Initialize DuckSync with an existing DuckLake catalog. Use this if you already have DuckLake attached. Recommended approach for existing DuckLake users [SELECT * FROM ducksync_init('my_ducklake');]
ducksync_setup_storage table Full setup - attaches DuckLake and initializes DuckSync. Use if you don't have DuckLake configured yet. Creates DuckLake catalog with PostgreSQL backend [SELECT * FROM ducksync_setup_storage('host=localhost dbname=ducklake user=postgres', '/data/ducksync');]
ducksync_add_source table Register a Snowflake data source using an existing DuckDB secret. Currently only supports Snowflake [SELECT * FROM ducksync_add_source('prod', 'snowflake', 'my_snowflake_secret');]
ducksync_create_cache table Define a cached query result with optional TTL and monitor tables for smart refresh. monitor_tables triggers refresh when source changes [SELECT * FROM ducksync_create_cache('customers', 'prod', 'SELECT * FROM CUSTOMERS', ['PROD.PUBLIC.CUSTOMERS'], 3600);]
ducksync_refresh table Refresh a cache. Uses smart check (skips if unchanged) unless force=true. Returns SKIPPED, REFRESHED, or ERROR [SELECT * FROM ducksync_refresh('customers');]
ducksync_query table Smart query routing - returns data from cache if available, otherwise passes through to Snowflake. Checks TTL and auto-refreshes expired caches. Main query interface [SELECT * FROM ducksync_query('SELECT * FROM PROD.PUBLIC.CUSTOMERS WHERE region = ''US''', 'prod');]