Search Shortcut cmd + k | ctrl + k
pac

Automatic query privatization using the PAC Privacy framework

Maintainer(s): ila

Installing and Loading

INSTALL pac FROM community;
LOAD pac;

Example

-- Declare a privacy unit table with protected columns
CREATE PU TABLE employees (
    id INTEGER, department VARCHAR, salary DECIMAL(10,2),
    PAC_KEY (id),
    PROTECTED (salary)
);

INSERT INTO employees VALUES
    (1, 'Engineering', 95000), (2, 'Engineering', 110000),
    (3, 'Sales', 80000),      (4, 'Sales', 72000),
    (5, 'Marketing', 85000),  (6, 'Marketing', 90000);

-- Aggregates on protected columns are automatically noised
SELECT department, AVG(salary)::INTEGER AS avg_salary, COUNT(*) AS headcount
FROM employees GROUP BY department;
-- ┌─────────────┬────────────┬───────────┐
-- │ department  │ avg_salary │ headcount │
-- │   varchar   │   int32    │   int64   │
-- ├─────────────┼────────────┼───────────┤
-- │ Marketing   │      95601 │        15 │
-- │ Engineering │     109039 │        16 │
-- │ Sales       │      81145 │        17 │
-- └─────────────┴────────────┴───────────┘

About pac

PAC (Privacy-preserving Automatic Queries) is a DuckDB extension that transparently adds noise to aggregate query results to protect against Membership Inference Attacks. Unlike Differential Privacy, PAC requires no per-query privacy analysis — just declare which columns are protected and queries are privatized automatically.

Key Features:

  • CREATE PU TABLE syntax to declare privacy units and protected columns
  • PAC_LINK for privacy propagation through join chains
  • Supports COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)
  • Tunable privacy-utility tradeoff via pac_mi parameter

How It Works:

  1. Mark tables with CREATE PU TABLE, specifying a PAC_KEY (privacy unit) and PROTECTED columns
  2. Protected columns cannot be projected directly — they can only appear inside aggregates
  3. The extension automatically injects calibrated noise into aggregate results
  4. Join chains propagate privacy constraints via PAC_LINK foreign key declarations

Configuration Settings:

  • pac_mi: Mutual information bound controlling the privacy-utility tradeoff (default: 1/128)

Requirements:

  • DuckDB 1.5.0+

Added Functions

function_name function_type description comment examples
clear_pac_metadata pragma NULL NULL  
load_pac_metadata pragma NULL NULL  
pac_avg aggregate NULL NULL  
pac_coalesce scalar NULL NULL  
pac_count aggregate NULL NULL  
pac_div scalar NULL NULL  
pac_filter scalar NULL NULL  
pac_filter_eq scalar NULL NULL  
pac_filter_gt scalar NULL NULL  
pac_filter_gte scalar NULL NULL  
pac_filter_lt scalar NULL NULL  
pac_filter_lte scalar NULL NULL  
pac_filter_neq scalar NULL NULL  
pac_hash scalar NULL NULL  
pac_max aggregate NULL NULL  
pac_mean scalar NULL NULL  
pac_min aggregate NULL NULL  
pac_noised scalar NULL NULL  
pac_noised_avg aggregate NULL NULL  
pac_noised_count aggregate NULL NULL  
pac_noised_div scalar NULL NULL  
pac_noised_max aggregate NULL NULL  
pac_noised_min aggregate NULL NULL  
pac_noised_sum aggregate NULL NULL  
pac_select scalar NULL NULL  
pac_select_eq scalar NULL NULL  
pac_select_gt scalar NULL NULL  
pac_select_gte scalar NULL NULL  
pac_select_lt scalar NULL NULL  
pac_select_lte scalar NULL NULL  
pac_select_neq scalar NULL NULL  
pac_sum aggregate NULL NULL  
save_pac_metadata pragma NULL NULL  

Overloaded Functions

| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|

Added Types

| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|

Added Settings

name description input_type scope aliases
enforce_m_values enforce per-sample arrays length equals pac_m BOOLEAN GLOBAL []
pac_categorical enable categorical query rewrites BOOLEAN GLOBAL []
pac_compiled_path path to write compiled PAC artifacts VARCHAR GLOBAL []
pac_conservative_mode throw errors for unsupported operators (when false, skip PAC compilation) BOOLEAN GLOBAL []
pac_correction correction factor for PAC aggregates DOUBLE GLOBAL []
pac_diffcols key columns and optional output path for utility diff VARCHAR GLOBAL []
pac_hash_repair pac_hash() repairs hash to exactly 32 bits set BOOLEAN GLOBAL []
pac_join_elimination eliminate final join to PU table BOOLEAN GLOBAL []
pac_m number of per-sample subsets (m) INTEGER GLOBAL []
pac_mi mutual information parameter for PAC aggregates DOUBLE GLOBAL []
pac_noise apply PAC noise BOOLEAN GLOBAL []
pac_ptracking enable persistent secret p-tracking for query-level MIA BOOLEAN GLOBAL []
pac_pushdown_topk apply top-k before noise instead of after BOOLEAN GLOBAL []
pac_seed deterministic RNG seed for PAC functions BIGINT GLOBAL []
pac_select use pac_select for categorical filters below pac aggregates BOOLEAN GLOBAL []
pac_topk_expansion expansion factor for top-k superset (1 = no expansion) DOUBLE GLOBAL []