Search Shortcut cmd + k | ctrl + k
plinking_duck

Read PLINK 2 genomics file formats and run common genetic analyses directly in SQL

Maintainer(s): teaguesterling

Installing and Loading

INSTALL plinking_duck FROM community;
LOAD plinking_duck;

Example

-- Read variant metadata from a .pvar file
SELECT CHROM, POS, ID, REF, ALT
FROM read_pvar('cohort.pvar')
WHERE CHROM = '22';

-- Read genotypes in tidy format (one row per variant x sample)
SELECT chrom, pos, iid, genotype
FROM read_pfile('cohort', orient := 'genotype')
LIMIT 10;

-- Compute allele frequencies
SELECT * FROM plink_freq('cohort.pgen')
WHERE ALT_FREQ > 0.01;

-- Run a GWAS association test
SELECT * FROM plink_glm('cohort')
WHERE p_value < 5e-8;

About plinking_duck

PlinkingDuck brings PLINK 2 genotype, variant, and sample data into DuckDB, letting you query genomics datasets with standard SQL instead of format-specific command-line tools.

File readers:

  • read_pvar(path) — variant metadata (.pvar/.bim)
  • read_psam(path) — sample metadata (.psam/.fam)
  • read_pgen(path) — binary genotypes (.pgen)
  • read_pfile(prefix) — unified fileset reader with orient modes (variant/genotype/sample), sample subsetting, region and variant filtering

Analysis functions:

  • plink_freq — per-variant allele frequencies via fast genotype counting
  • plink_hardy — Hardy-Weinberg equilibrium exact test
  • plink_missing — per-variant or per-sample missingness rates
  • plink_ld — pairwise linkage disequilibrium (r², D, D')
  • plink_score — polygenic risk scoring with mean imputation
  • plink_glm — per-variant GWAS regression (linear, logistic, Firth)

Filter pushdown:

  • af_range / ac_range — filter variants by allele frequency or count
  • genotype_range — filter individual genotype values

All functions support projection pushdown (skip genotype decompression for metadata-only queries), parallel scanning, sample subsetting, and region filtering. Legacy PLINK 1 formats (.bim/.fam) are auto-detected.

Built on pgenlib for efficient access to the compressed .pgen binary format without full decompression.

For full documentation, see plinking-duck.readthedocs.io.

Added Functions

function_name function_type description comment examples
plink_freq table NULL NULL  
plink_glm table NULL NULL  
plink_hardy table NULL NULL  
plink_ld table NULL NULL  
plink_missing table NULL NULL  
plink_score table NULL NULL  
read_pfile table NULL NULL  
read_pgen table NULL NULL  
read_psam table NULL NULL  
read_pvar table NULL NULL  

Overloaded Functions

| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|

Added Types

| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|

Added Settings

name description input_type scope aliases
plinking_max_matrix_elements Maximum genotype matrix elements for orient := 'sample' pre-read (variants x samples). Default 16 billion (~16 GB of int8). BIGINT GLOBAL []