Read PLINK 2 genomics file formats and run common genetic analyses directly in SQL
Installing and Loading
INSTALL plinking_duck FROM community;
LOAD plinking_duck;
Example
-- Read variant metadata from a .pvar file
SELECT CHROM, POS, ID, REF, ALT
FROM read_pvar('cohort.pvar')
WHERE CHROM = '22';
-- Read genotypes in tidy format (one row per variant x sample)
SELECT chrom, pos, iid, genotype
FROM read_pfile('cohort', orient := 'genotype')
LIMIT 10;
-- Compute allele frequencies
SELECT * FROM plink_freq('cohort.pgen')
WHERE ALT_FREQ > 0.01;
-- Run a GWAS association test
SELECT * FROM plink_glm('cohort')
WHERE p_value < 5e-8;
About plinking_duck
PlinkingDuck brings PLINK 2 genotype, variant, and sample data into DuckDB, letting you query genomics datasets with standard SQL instead of format-specific command-line tools.
File readers:
read_pvar(path)— variant metadata (.pvar/.bim)read_psam(path)— sample metadata (.psam/.fam)read_pgen(path)— binary genotypes (.pgen)read_pfile(prefix)— unified fileset reader with orient modes (variant/genotype/sample), sample subsetting, region and variant filtering
Analysis functions:
plink_freq— per-variant allele frequencies via fast genotype countingplink_hardy— Hardy-Weinberg equilibrium exact testplink_missing— per-variant or per-sample missingness ratesplink_ld— pairwise linkage disequilibrium (r², D, D')plink_score— polygenic risk scoring with mean imputationplink_glm— per-variant GWAS regression (linear, logistic, Firth)
Filter pushdown:
af_range/ac_range— filter variants by allele frequency or countgenotype_range— filter individual genotype values
All functions support projection pushdown (skip genotype decompression for metadata-only queries), parallel scanning, sample subsetting, and region filtering. Legacy PLINK 1 formats (.bim/.fam) are auto-detected.
Built on pgenlib for efficient access to the compressed .pgen binary format without full decompression.
For full documentation, see plinking-duck.readthedocs.io.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| plink_freq | table | NULL | NULL | |
| plink_glm | table | NULL | NULL | |
| plink_hardy | table | NULL | NULL | |
| plink_ld | table | NULL | NULL | |
| plink_missing | table | NULL | NULL | |
| plink_score | table | NULL | NULL | |
| read_pfile | table | NULL | NULL | |
| read_pgen | table | NULL | NULL | |
| read_psam | table | NULL | NULL | |
| read_pvar | table | NULL | NULL |
Overloaded Functions
| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|
Added Types
| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| plinking_max_matrix_elements | Maximum genotype matrix elements for orient := 'sample' pre-read (variants x samples). Default 16 billion (~16 GB of int8). | BIGINT | GLOBAL | [] |