Automatic query privatization using the PAC Privacy framework
Maintainer(s):
ila
Installing and Loading
INSTALL pac FROM community;
LOAD pac;
Example
-- Declare a privacy unit table with protected columns
CREATE PU TABLE employees (
id INTEGER, department VARCHAR, salary DECIMAL(10,2),
PAC_KEY (id),
PROTECTED (salary)
);
INSERT INTO employees VALUES
(1, 'Engineering', 95000), (2, 'Engineering', 110000),
(3, 'Sales', 80000), (4, 'Sales', 72000),
(5, 'Marketing', 85000), (6, 'Marketing', 90000);
-- Aggregates on protected columns are automatically noised
SELECT department, AVG(salary)::INTEGER AS avg_salary, COUNT(*) AS headcount
FROM employees GROUP BY department;
-- ┌─────────────┬────────────┬───────────┐
-- │ department │ avg_salary │ headcount │
-- │ varchar │ int32 │ int64 │
-- ├─────────────┼────────────┼───────────┤
-- │ Marketing │ 95601 │ 15 │
-- │ Engineering │ 109039 │ 16 │
-- │ Sales │ 81145 │ 17 │
-- └─────────────┴────────────┴───────────┘
About pac
PAC (Privacy-preserving Automatic Queries) is a DuckDB extension that transparently adds noise to aggregate query results to protect against Membership Inference Attacks. Unlike Differential Privacy, PAC requires no per-query privacy analysis — just declare which columns are protected and queries are privatized automatically.
Key Features:
CREATE PU TABLEsyntax to declare privacy units and protected columnsPAC_LINKfor privacy propagation through join chains- Supports COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)
- Tunable privacy-utility tradeoff via
pac_miparameter
How It Works:
- Mark tables with
CREATE PU TABLE, specifying aPAC_KEY(privacy unit) andPROTECTEDcolumns - Protected columns cannot be projected directly — they can only appear inside aggregates
- The extension automatically injects calibrated noise into aggregate results
- Join chains propagate privacy constraints via
PAC_LINKforeign key declarations
Configuration Settings:
pac_mi: Mutual information bound controlling the privacy-utility tradeoff (default:1/128)
Requirements:
- DuckDB 1.5.0+
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| clear_pac_metadata | pragma | NULL | NULL | |
| load_pac_metadata | pragma | NULL | NULL | |
| pac_avg | aggregate | NULL | NULL | |
| pac_coalesce | scalar | NULL | NULL | |
| pac_count | aggregate | NULL | NULL | |
| pac_div | scalar | NULL | NULL | |
| pac_filter | scalar | NULL | NULL | |
| pac_filter_eq | scalar | NULL | NULL | |
| pac_filter_gt | scalar | NULL | NULL | |
| pac_filter_gte | scalar | NULL | NULL | |
| pac_filter_lt | scalar | NULL | NULL | |
| pac_filter_lte | scalar | NULL | NULL | |
| pac_filter_neq | scalar | NULL | NULL | |
| pac_hash | scalar | NULL | NULL | |
| pac_max | aggregate | NULL | NULL | |
| pac_mean | scalar | NULL | NULL | |
| pac_min | aggregate | NULL | NULL | |
| pac_noised | scalar | NULL | NULL | |
| pac_noised_avg | aggregate | NULL | NULL | |
| pac_noised_count | aggregate | NULL | NULL | |
| pac_noised_div | scalar | NULL | NULL | |
| pac_noised_max | aggregate | NULL | NULL | |
| pac_noised_min | aggregate | NULL | NULL | |
| pac_noised_sum | aggregate | NULL | NULL | |
| pac_select | scalar | NULL | NULL | |
| pac_select_eq | scalar | NULL | NULL | |
| pac_select_gt | scalar | NULL | NULL | |
| pac_select_gte | scalar | NULL | NULL | |
| pac_select_lt | scalar | NULL | NULL | |
| pac_select_lte | scalar | NULL | NULL | |
| pac_select_neq | scalar | NULL | NULL | |
| pac_sum | aggregate | NULL | NULL | |
| save_pac_metadata | pragma | NULL | NULL |
Overloaded Functions
| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|
Added Types
| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| enforce_m_values | enforce per-sample arrays length equals pac_m | BOOLEAN | GLOBAL | [] |
| pac_categorical | enable categorical query rewrites | BOOLEAN | GLOBAL | [] |
| pac_compiled_path | path to write compiled PAC artifacts | VARCHAR | GLOBAL | [] |
| pac_conservative_mode | throw errors for unsupported operators (when false, skip PAC compilation) | BOOLEAN | GLOBAL | [] |
| pac_correction | correction factor for PAC aggregates | DOUBLE | GLOBAL | [] |
| pac_diffcols | key columns and optional output path for utility diff | VARCHAR | GLOBAL | [] |
| pac_hash_repair | pac_hash() repairs hash to exactly 32 bits set | BOOLEAN | GLOBAL | [] |
| pac_join_elimination | eliminate final join to PU table | BOOLEAN | GLOBAL | [] |
| pac_m | number of per-sample subsets (m) | INTEGER | GLOBAL | [] |
| pac_mi | mutual information parameter for PAC aggregates | DOUBLE | GLOBAL | [] |
| pac_noise | apply PAC noise | BOOLEAN | GLOBAL | [] |
| pac_ptracking | enable persistent secret p-tracking for query-level MIA | BOOLEAN | GLOBAL | [] |
| pac_pushdown_topk | apply top-k before noise instead of after | BOOLEAN | GLOBAL | [] |
| pac_seed | deterministic RNG seed for PAC functions | BIGINT | GLOBAL | [] |
| pac_select | use pac_select for categorical filters below pac aggregates | BOOLEAN | GLOBAL | [] |
| pac_topk_expansion | expansion factor for top-k superset (1 = no expansion) | DOUBLE | GLOBAL | [] |