Search Shortcut cmd + k | ctrl + k
duckhts

Read HTS (VCF/BCF/BAM/CRAM/FASTA/FASTQ/GTF/GFF) files in DuckDB via htslib

Maintainer(s): Sounkou Mahamane Toure

Installing and Loading

INSTALL duckhts FROM community;
LOAD duckhts;

Example

-- Load the extension
LOAD duckhts;

-- Read a VCF/BCF file (tidy FORMAT columns)
SELECT CHROM, POS, REF, ALT, FORMAT_GT
FROM read_bcf('test/data/formatcols.vcf.gz', tidy_format := true)
LIMIT 5;

-- Read a BAM/SAM file
SELECT QNAME, RNAME, POS, READ_GROUP_ID, SAMPLE_ID
FROM read_bam('test/data/rg.sam.gz')
LIMIT 5;

About duckhts

DuckHTS provides table functions for common high-throughput sequencing (HTS) formats using htslib. Query VCF/BCF/BAM/CRAM/FASTA/FASTQ/GTF/GFF and tabix-indexed files directly in SQL.

Functions include:

  • read_bcf(path, [region, tidy_format])
  • read_bam(path, [region, reference, standard_tags, auxiliary_tags])
  • read_fasta(path)
  • read_fastq(path, [mate_path, interleaved])
  • read_gff(path, [region, attributes_map])
  • read_gtf(path, [region, attributes_map])
  • read_tabix(path, [region, header, header_names, auto_detect, column_types])

Paired FASTQ is supported via mate_path or interleaved := true. CRAM is supported with an explicit reference file. For GTF/GFF, attributes can be returned as a parsed MAP using attributes_map := true. Optional SAMtags columns and an auxiliary tag map are available via standard_tags and auxiliary_tags. Tabix files can use header/header_names and type inference via auto_detect or explicit column_types.

MSVC builds (windows_amd64/windows_arm64) are not supported. MinGW/RTools is supported on Windows.

Added Functions

function_name function_type description comment examples
read_bam table NULL NULL  
read_bcf table NULL NULL  
read_fasta table NULL NULL  
read_fastq table NULL NULL  
read_gff table NULL NULL  
read_gtf table NULL NULL  
read_tabix table NULL NULL