Automatic query privatization using the PAC Privacy framework
Installing and Loading
INSTALL pac FROM community;
LOAD pac;
Example
-- Generate TPC-H benchmark data
INSTALL tpch;
LOAD tpch;
CALL dbgen(sf=1);
-- Mark customer as the privacy unit
ALTER TABLE customer ADD PAC_KEY (c_custkey);
ALTER TABLE customer SET PU;
-- Protect sensitive customer columns
ALTER PU TABLE customer ADD PROTECTED (c_custkey);
ALTER PU TABLE customer ADD PROTECTED (c_name);
ALTER PU TABLE customer ADD PROTECTED (c_address);
ALTER PU TABLE customer ADD PROTECTED (c_acctbal);
-- Define join chain: lineitem -> orders -> customer
ALTER TABLE orders ADD PAC_LINK (o_custkey) REFERENCES customer(c_custkey);
ALTER TABLE lineitem ADD PAC_LINK (l_orderkey) REFERENCES orders(o_orderkey);
-- Aggregates on linked tables are automatically noised
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice)
FROM lineitem GROUP BY ALL;
About pac
PAC (Probably Approximately Correct) is a DuckDB extension that automatically privatizes SQL queries, protecting against Membership Inference Attacks by adding calibrated noise to aggregate results. Unlike Differential Privacy, PAC works transparently — no per-query privacy analysis by a specialist is needed.
How It Works:
- Declare a privacy unit table with
CREATE PU TABLEorALTER TABLE SET PU, specifying aPAC_KEY(unique identifier) andPROTECTEDcolumns - Protected columns cannot be projected directly — they can only appear inside aggregates
- The extension automatically rewrites query plans to inject calibrated noise
- Join chains propagate privacy via
PAC_LINKforeign key declarations
Supported Aggregates: SUM, COUNT, AVG, MIN, MAX, COUNT(DISTINCT)
Key Settings:
pac_mi— Mutual information bound controlling the privacy-utility tradeoff (default:1/128). Lower values = more noise = more privacy. Set to0for deterministic (no noise) mode.pac_seed— RNG seed for reproducible noised results
Privacy Propagation: PAC automatically follows foreign key chains declared
with PAC_LINK to find the privacy unit. For example, querying lineitem with
a link chain lineitem -> orders -> customer will automatically join back to
customer to hash the privacy unit key, without any changes to your query.
More information: See the PAC repository for full documentation, examples, and benchmarks.
Requirements:
- DuckDB 1.5.0+
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| clear_pac_metadata | pragma | NULL | NULL | |
| load_pac_metadata | pragma | NULL | NULL | |
| pac_avg | aggregate | [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. | NULL | |
| pac_coalesce | scalar | [INTERNAL] Replaces NULL counter list with 64 zeros. | NULL | |
| pac_count | aggregate | [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. | NULL | |
| pac_div | scalar | [INTERNAL] Element-wise division of two counter lists. | NULL | |
| pac_filter | scalar | [INTERNAL] Probabilistic filter from boolean list for categorical queries. | NULL | |
| pac_filter_eq | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_filter_gt | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_filter_gte | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_filter_lt | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_filter_lte | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_filter_neq | scalar | [INTERNAL] Categorical comparison + probabilistic filter for PAC queries. | NULL | |
| pac_hash | scalar | [INTERNAL] Hashes a privacy unit key for PAC subsample assignment. | NULL | |
| pac_max | aggregate | [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. | NULL | |
| pac_mean | scalar | [INTERNAL] Computes the mean of 64 PAC subsample counters (true aggregate before noise). | NULL | |
| pac_min | aggregate | [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. | NULL | |
| pac_noised | scalar | [INTERNAL] Applies PAC noise to 64-element counter list, returns scalar. | NULL | |
| pac_noised_avg | aggregate | Privacy-preserving AVG. Automatically injected by PAC for protected columns. | NULL | [SELECT c_mktsegment, pac_noised_avg(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment] |
| pac_noised_count | aggregate | Privacy-preserving COUNT. Automatically injected by PAC for protected columns. | NULL | [SELECT c_mktsegment, pac_noised_count(pac_hash(hash(c_custkey))) FROM customer GROUP BY c_mktsegment] |
| pac_noised_div | scalar | [INTERNAL] Fused counter-list division + noise for AVG. | NULL | |
| pac_noised_max | aggregate | Privacy-preserving MAX. Automatically injected by PAC for protected columns. | NULL | [SELECT c_mktsegment, pac_noised_max(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment] |
| pac_noised_min | aggregate | Privacy-preserving MIN. Automatically injected by PAC for protected columns. | NULL | [SELECT c_mktsegment, pac_noised_min(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment] |
| pac_noised_sum | aggregate | Privacy-preserving SUM. Automatically injected by PAC for protected columns. | NULL | [SELECT c_mktsegment, pac_noised_sum(pac_hash(hash(c_custkey)), c_acctbal) FROM customer GROUP BY c_mktsegment] |
| pac_select | scalar | [INTERNAL] Combines boolean mask with hash for categorical PAC queries. | NULL | |
| pac_select_eq | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_select_gt | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_select_gte | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_select_lt | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_select_lte | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_select_neq | scalar | [INTERNAL] Categorical comparison + mask application for PAC queries. | NULL | |
| pac_sum | aggregate | [INTERNAL] Returns 64 PAC subsample counters as LIST for categorical queries. | NULL | |
| save_pac_metadata | pragma | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| enforce_m_values | [INTERNAL] Enforce per-sample array length equals pac_m | BOOLEAN | GLOBAL | [] |
| pac_categorical | [INTERNAL] Enable categorical query rewrites | BOOLEAN | GLOBAL | [] |
| pac_check | [INTERNAL] Enforce protected column access restrictions | BOOLEAN | GLOBAL | [] |
| pac_compiled_path | [INTERNAL] Path to write compiled PAC artifacts | VARCHAR | GLOBAL | [] |
| pac_conservative_mode | [INTERNAL] Throw errors for unsupported operators (when false, skip PAC compilation) | BOOLEAN | GLOBAL | [] |
| pac_correction | Correction factor multiplied into aggregate results (default: 1.0) | DOUBLE | GLOBAL | [] |
| pac_diffcols | Measure utility: specify number of key columns and optional output path (e.g. '2:out.csv') | VARCHAR | GLOBAL | [] |
| pac_hash_repair | [INTERNAL] pac_hash() repairs hash to exactly 32 bits set | BOOLEAN | GLOBAL | [] |
| pac_join_elimination | [INTERNAL] Eliminate final join to PU table | BOOLEAN | GLOBAL | [] |
| pac_m | [INTERNAL] Number of per-sample subsets | INTEGER | GLOBAL | [] |
| pac_mi | Mutual information bound controlling privacy-utility tradeoff (default: 1/128). Lower values = more noise = more privacy. Set to 0 for deterministic (no noise) mode. | DOUBLE | GLOBAL | [] |
| pac_noise | Enable/disable PAC noise application (set to false for debugging) | BOOLEAN | GLOBAL | [] |
| pac_ptracking | [INTERNAL] Enable persistent secret p-tracking for query-level MIA | BOOLEAN | GLOBAL | [] |
| pac_pushdown_topk | [INTERNAL] Apply top-k before noise instead of after | BOOLEAN | GLOBAL | [] |
| pac_rewrite | [INTERNAL] Enable PAC query plan rewriting | BOOLEAN | GLOBAL | [] |
| pac_seed | RNG seed for reproducible noised results | BIGINT | GLOBAL | [] |
| pac_select | [INTERNAL] Use pac_select for categorical filters below pac aggregates | BOOLEAN | GLOBAL | [] |
| pac_topk_expansion | [INTERNAL] Expansion factor for top-k superset (1 = no expansion) | DOUBLE | GLOBAL | [] |