Read PLINK 2 genomics file formats and run common genetic analyses directly in SQL
Maintainer(s):
teaguesterling
Installing and Loading
INSTALL plinking_duck FROM community;
LOAD plinking_duck;
Example
-- Read variant metadata from a .pvar file
SELECT CHROM, POS, ID, REF, ALT
FROM read_pvar('cohort.pvar')
WHERE CHROM = '22';
-- Read genotypes in tidy format (one row per variant x sample)
SELECT chrom, pos, iid, genotype
FROM read_pfile('cohort', tidy := true)
LIMIT 10;
-- Compute allele frequencies
SELECT * FROM plink_freq('cohort.pgen')
WHERE ALT_FREQ > 0.01;
-- Hardy-Weinberg equilibrium QC filter
SELECT * FROM plink_hardy('cohort.pgen')
WHERE P_HWE > 1e-6;
About plinking_duck
PlinkingDuck brings PLINK 2 genotype, variant, and sample data into DuckDB, letting you query genomics datasets with standard SQL instead of format-specific command-line tools.
File readers:
read_pvar(path)— variant metadata (.pvar/.bim)read_psam(path)— sample metadata (.psam/.fam)read_pgen(path)— binary genotypes (.pgen)read_pfile(prefix)— unified fileset reader with tidy mode, sample subsetting, region and variant filtering
Analysis functions:
plink_freq— per-variant allele frequencies via fast genotype countingplink_hardy— Hardy-Weinberg equilibrium exact testplink_missing— per-variant or per-sample missingness ratesplink_ld— pairwise linkage disequilibrium (r², D, D')plink_score— polygenic risk scoring with mean imputation
All functions support projection pushdown (skip genotype decompression for metadata-only queries), parallel scanning, sample subsetting, and region filtering. Legacy PLINK 1 formats (.bim/.fam) are auto-detected.
Built on pgenlib for efficient access to the compressed .pgen binary format without full decompression.
For full documentation, see plinking-duck.readthedocs.io.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| plink_freq | table | NULL | NULL | |
| plink_hardy | table | NULL | NULL | |
| plink_ld | table | NULL | NULL | |
| plink_missing | table | NULL | NULL | |
| plink_score | table | NULL | NULL | |
| read_pfile | table | NULL | NULL | |
| read_pgen | table | NULL | NULL | |
| read_psam | table | NULL | NULL | |
| read_pvar | table | NULL | NULL |