Search Shortcut cmd + k | ctrl + k
plinking_duck

Read PLINK 2 genomics file formats and run common genetic analyses directly in SQL

Maintainer(s): teaguesterling

Installing and Loading

INSTALL plinking_duck FROM community;
LOAD plinking_duck;

Example

-- Read variant metadata from a .pvar file
SELECT CHROM, POS, ID, REF, ALT
FROM read_pvar('cohort.pvar')
WHERE CHROM = '22';

-- Read genotypes in tidy format (one row per variant x sample)
SELECT chrom, pos, iid, genotype
FROM read_pfile('cohort', tidy := true)
LIMIT 10;

-- Compute allele frequencies
SELECT * FROM plink_freq('cohort.pgen')
WHERE ALT_FREQ > 0.01;

-- Hardy-Weinberg equilibrium QC filter
SELECT * FROM plink_hardy('cohort.pgen')
WHERE P_HWE > 1e-6;

About plinking_duck

PlinkingDuck brings PLINK 2 genotype, variant, and sample data into DuckDB, letting you query genomics datasets with standard SQL instead of format-specific command-line tools.

File readers:

  • read_pvar(path) — variant metadata (.pvar/.bim)
  • read_psam(path) — sample metadata (.psam/.fam)
  • read_pgen(path) — binary genotypes (.pgen)
  • read_pfile(prefix) — unified fileset reader with tidy mode, sample subsetting, region and variant filtering

Analysis functions:

  • plink_freq — per-variant allele frequencies via fast genotype counting
  • plink_hardy — Hardy-Weinberg equilibrium exact test
  • plink_missing — per-variant or per-sample missingness rates
  • plink_ld — pairwise linkage disequilibrium (r², D, D')
  • plink_score — polygenic risk scoring with mean imputation

All functions support projection pushdown (skip genotype decompression for metadata-only queries), parallel scanning, sample subsetting, and region filtering. Legacy PLINK 1 formats (.bim/.fam) are auto-detected.

Built on pgenlib for efficient access to the compressed .pgen binary format without full decompression.

For full documentation, see plinking-duck.readthedocs.io.

Added Functions

function_name function_type description comment examples
plink_freq table NULL NULL  
plink_hardy table NULL NULL  
plink_ld table NULL NULL  
plink_missing table NULL NULL  
plink_score table NULL NULL  
read_pfile table NULL NULL  
read_pgen table NULL NULL  
read_psam table NULL NULL  
read_pvar table NULL NULL