Search Shortcut cmd + k | ctrl + k
pac

Automatic query privatization using the PAC Privacy framework

Maintainer(s): ila

Installing and Loading

INSTALL pac FROM community;
LOAD pac;

Example

-- Generate TPC-H benchmark data
INSTALL tpch;
LOAD tpch;
CALL dbgen(sf=1);

-- Mark customer as the privacy unit
ALTER TABLE customer ADD PAC_KEY (c_custkey);
ALTER TABLE customer SET PU;

-- Protect sensitive customer columns
ALTER PU TABLE customer ADD PROTECTED (c_custkey);
ALTER PU TABLE customer ADD PROTECTED (c_name);
ALTER PU TABLE customer ADD PROTECTED (c_address);
ALTER PU TABLE customer ADD PROTECTED (c_acctbal);

-- Define join chain: lineitem -> orders -> customer
ALTER TABLE orders ADD PAC_LINK (o_custkey) REFERENCES customer(c_custkey);
ALTER TABLE lineitem ADD PAC_LINK (l_orderkey) REFERENCES orders(o_orderkey);

-- Aggregates on linked tables are automatically noised
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice)
FROM lineitem GROUP BY ALL;

About pac

PAC (Probably Approximately Correct) is a DuckDB extension that automatically privatizes SQL queries, protecting against Membership Inference Attacks by adding calibrated noise to aggregate results. Unlike Differential Privacy, PAC works transparently — no per-query privacy analysis by a specialist is needed.

How It Works:

  1. Declare a privacy unit table with CREATE PU TABLE or ALTER TABLE SET PU, specifying a PAC_KEY (unique identifier) and PROTECTED columns
  2. Protected columns cannot be projected directly — they can only appear inside aggregates
  3. The extension automatically rewrites query plans to inject calibrated noise
  4. Join chains propagate privacy via PAC_LINK foreign key declarations

Supported Aggregates: SUM, COUNT, AVG, MIN, MAX, COUNT(DISTINCT)

Key Settings:

  • pac_mi — Mutual information bound controlling the privacy-utility tradeoff (default: 1/128). Lower values = more noise = more privacy. Set to 0 for deterministic (no noise) mode.
  • pac_seed — RNG seed for reproducible noised results

Privacy Propagation: PAC automatically follows foreign key chains declared with PAC_LINK to find the privacy unit. For example, querying lineitem with a link chain lineitem -> orders -> customer will automatically join back to customer to hash the privacy unit key, without any changes to your query.

More information: See the PAC repository for full documentation, examples, and benchmarks.

Requirements:

  • DuckDB 1.5.0+

Added Functions

function_name function_type description comment examples
clear_pac_metadata pragma NULL NULL  
load_pac_metadata pragma NULL NULL  
pac_avg aggregate [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. NULL  
pac_coalesce scalar [INTERNAL] Replaces NULL counter list with 64 zeros. NULL  
pac_count aggregate [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. NULL  
pac_div scalar [INTERNAL] Element-wise division of two counter lists. NULL  
pac_filter scalar [INTERNAL] Probabilistic filter from boolean list for categorical queries. NULL  
pac_filter_eq scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_filter_gt scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_filter_gte scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_filter_lt scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_filter_lte scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_filter_neq scalar [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. NULL  
pac_hash scalar [INTERNAL] Hashes a privacy unit key for PAC subsample assignment. NULL  
pac_max aggregate [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. NULL  
pac_mean scalar [INTERNAL] Computes the mean of 64 PAC subsample counters (true aggregate before noise). NULL  
pac_min aggregate [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. NULL  
pac_noised scalar [INTERNAL] Applies PAC noise to 64-element counter list, returns scalar. NULL  
pac_noised_avg aggregate Privacy-preserving AVG. Automatically injected by PAC for protected columns. NULL [SELECT c_mktsegment, pac_noised_avg(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment]
pac_noised_count aggregate Privacy-preserving COUNT. Automatically injected by PAC for protected columns. NULL [SELECT c_mktsegment, pac_noised_count(pac_hash(hash(c_custkey))) FROM customer GROUP BY c_mktsegment]
pac_noised_div scalar [INTERNAL] Fused counter-list division + noise for AVG. NULL  
pac_noised_max aggregate Privacy-preserving MAX. Automatically injected by PAC for protected columns. NULL [SELECT c_mktsegment, pac_noised_max(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment]
pac_noised_min aggregate Privacy-preserving MIN. Automatically injected by PAC for protected columns. NULL [SELECT c_mktsegment, pac_noised_min(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment]
pac_noised_sum aggregate Privacy-preserving SUM. Automatically injected by PAC for protected columns. NULL [SELECT c_mktsegment, pac_noised_sum(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment]
pac_select scalar [INTERNAL] Combines boolean mask with hash for categorical PAC queries. NULL  
pac_select_eq scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_select_gt scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_select_gte scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_select_lt scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_select_lte scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_select_neq scalar [INTERNAL] Categorical comparison + mask application for PAC queries. NULL  
pac_sum aggregate [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. NULL  
save_pac_metadata pragma NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

name description input_type scope aliases
enforce_m_values [INTERNAL] Enforce per-sample array length equals pac_m BOOLEAN GLOBAL []
pac_categorical [INTERNAL] Enable categorical query rewrites BOOLEAN GLOBAL []
pac_check [INTERNAL] Enforce protected column access restrictions BOOLEAN GLOBAL []
pac_compiled_path [INTERNAL] Path to write compiled PAC artifacts VARCHAR GLOBAL []
pac_conservative_mode [INTERNAL] Throw errors for unsupported operators (when false, skip PAC compilation) BOOLEAN GLOBAL []
pac_correction Correction factor multiplied into aggregate results (default: 1.0) DOUBLE GLOBAL []
pac_diffcols Measure utility: specify number of key columns and optional output path (e.g. '2:out.csv') VARCHAR GLOBAL []
pac_hash_repair [INTERNAL] pac_hash() repairs hash to exactly 32 bits set BOOLEAN GLOBAL []
pac_join_elimination [INTERNAL] Eliminate final join to PU table BOOLEAN GLOBAL []
pac_m [INTERNAL] Number of per-sample subsets INTEGER GLOBAL []
pac_mi Mutual information bound controlling privacy-utility tradeoff (default: 1/128). Lower values = more noise = more privacy. Set to 0 for deterministic (no noise) mode. DOUBLE GLOBAL []
pac_noise Enable/disable PAC noise application (set to false for debugging) BOOLEAN GLOBAL []
pac_ptracking [INTERNAL] Enable persistent secret p-tracking for query-level MIA BOOLEAN GLOBAL []
pac_pushdown_topk [INTERNAL] Apply top-k before noise instead of after BOOLEAN GLOBAL []
pac_rewrite [INTERNAL] Enable PAC query plan rewriting BOOLEAN GLOBAL []
pac_seed RNG seed for reproducible noised results BIGINT GLOBAL []
pac_select [INTERNAL] Use pac_select for categorical filters below pac aggregates BOOLEAN GLOBAL []
pac_topk_expansion [INTERNAL] Expansion factor for top-k superset (1 = no expansion) DOUBLE GLOBAL []