Search Shortcut cmd + k | ctrl + k
stats_duck

A statistical computing toolkit for DuckDB — descriptive statistics, hypothesis tests, R-style distribution functions (d/p/q/r), OLS regression with HC and cluster-robust standard errors, grammar-of-graphics charts (VISUALIZE → Vega-Lite), and first-class readers/writers for SAS, SPSS, and Stata files, all directly in SQL.

Maintainer(s): caerbannogwhite

Installing and Loading

INSTALL stats_duck FROM community;
LOAD stats_duck;

Example

-- R-style distribution functions (d/p/q/r for 15+ families):
D SELECT pnorm(1.96) AS p, qt(0.975, 30) AS t_crit, dpois(3, 2.5) AS pmf;
┌────────────────────┬────────────────────┬─────────────────────┐
         p                 t_crit                pmf         
       double              double              double        
├────────────────────┼────────────────────┼─────────────────────┤
 0.9750021048517796  2.0422724563012373  0.21376301724973648 
└────────────────────┴────────────────────┴─────────────────────┘

-- OLS regression as an aggregate (one model per GROUP BY), with robust SEs.
-- vcov: 'const' | 'HC0'..'HC3' | 'CR0'/'CR1' (cluster-robust, + a cluster key).
D SELECT (c).term, (c).estimate, (c).std_error
  FROM (
    SELECT unnest((lm_fit(y, [x1, x2], 'HC1')).coefficients) AS c
    FROM my_data
  );

-- A clinical "Table 1" over mixed variable types, with between-group tests:
D SELECT * FROM table_one('patients', variables := ['age', 'sex', 'bmi'], by := ['arm']);

-- Grammar-of-graphics: turn a query into a Vega-Lite spec (+ the SQL to feed it):
D VISUALIZE wt AS x, mpg AS y FROM cars DRAW point;

About stats_duck

The Stats Duck

A statistical computing toolkit that brings common statistical workflows into DuckDB SQL. Everything runs as streaming aggregates and scalar functions, so it scales to large datasets — and also runs in DuckDB-WASM in the browser.

What's included

  • Descriptive statisticssummary_stats, table_one (the clinical "Table 1"), corr_matrix, meta.
  • Hypothesis tests — one/two-sample and paired t-tests, one-way ANOVA, chi-square, Mann–Whitney, Wilcoxon signed-rank, Kolmogorov–Smirnov, Shapiro–Wilk, Anderson–Darling, Jarque–Bera, and correlation tests (Pearson / Spearman / Kendall).
  • Distribution functions — R-style d/p/q/r quartets for 15+ families (normal, t, chi-square, F, gamma, beta, exponential, Weibull, log-normal, Poisson, negative binomial, hypergeometric, …).
  • Regressionlm / lm_summary table functions and the lm_fit aggregate (one model per GROUP BY) with classical, HC0–HC3 heteroskedasticity-consistent, and CR0/CR1 cluster-robust standard errors.
  • Grammar of graphicsVISUALIZE … DRAW <mark> returns a Vega-Lite v5 spec plus the SQL needed to feed it.
  • Statistical file formats — first-class readers and writers for SAS (.sas7bdat, .xpt), SPSS (.sav, .por), and Stata files, via ReadStat.
  • Resampling & correctionbootstrap, adjust_p (multiple-testing).

Full function reference: https://github.com/KoliStat/the-stats-duck.

Added Functions

function_name function_type description comment examples
adjust_p scalar NULL NULL  
anderson_darling aggregate NULL NULL  
anova_oneway aggregate NULL NULL  
bin_edges aggregate NULL NULL  
bin_label scalar NULL NULL  
bootstrap aggregate NULL NULL  
chisq_goodness_of_fit aggregate NULL NULL  
chisq_independence aggregate NULL NULL  
corr_matrix table NULL NULL  
dbeta scalar NULL NULL  
dchisq scalar NULL NULL  
dexp scalar NULL NULL  
df scalar NULL NULL  
dgamma scalar NULL NULL  
dhyper scalar NULL NULL  
dlnorm scalar NULL NULL  
dnbinom scalar NULL NULL  
dnorm scalar NULL NULL  
dpois scalar NULL NULL  
dt scalar NULL NULL  
dweibull scalar NULL NULL  
jarque_bera aggregate NULL NULL  
kendall_test aggregate NULL NULL  
ks_test_1samp aggregate NULL NULL  
ks_test_2samp aggregate NULL NULL  
lm table NULL NULL  
lm_fit aggregate NULL NULL  
lm_summary table NULL NULL  
mann_whitney_u aggregate NULL NULL  
meta table NULL NULL  
pbeta scalar NULL NULL  
pchisq scalar NULL NULL  
pearson_test aggregate NULL NULL  
pexp scalar NULL NULL  
pf scalar NULL NULL  
pgamma scalar NULL NULL  
phyper scalar NULL NULL  
plnorm scalar NULL NULL  
pnbinom scalar NULL NULL  
pnorm scalar NULL NULL  
poibin_cdf scalar NULL NULL  
ppois scalar NULL NULL  
pt scalar NULL NULL  
pweibull scalar NULL NULL  
qbeta scalar NULL NULL  
qchisq scalar NULL NULL  
qexp scalar NULL NULL  
qf scalar NULL NULL  
qgamma scalar NULL NULL  
qhyper scalar NULL NULL  
qlnorm scalar NULL NULL  
qnbinom scalar NULL NULL  
qnorm scalar NULL NULL  
qpois scalar NULL NULL  
qt scalar NULL NULL  
qweibull scalar NULL NULL  
rbeta scalar NULL NULL  
rchisq scalar NULL NULL  
read_stat table NULL NULL  
read_stat_metadata table NULL NULL  
rexp scalar NULL NULL  
rf scalar NULL NULL  
rgamma scalar NULL NULL  
rhyper scalar NULL NULL  
rlnorm scalar NULL NULL  
rnbinom scalar NULL NULL  
rnorm scalar NULL NULL  
rpois scalar NULL NULL  
rt scalar NULL NULL  
rweibull scalar NULL NULL  
shapiro_wilk aggregate NULL NULL  
sign_test_1samp aggregate NULL NULL  
sign_test_paired aggregate NULL NULL  
spearman_test aggregate NULL NULL  
summary_stats aggregate NULL NULL  
table_one table NULL NULL  
ttest_1samp aggregate NULL NULL  
ttest_2samp aggregate NULL NULL  
ttest_paired aggregate NULL NULL  
visualize_mark_v1_area scalar NULL NULL  
visualize_mark_v1_bar scalar NULL NULL  
visualize_mark_v1_boxplot scalar NULL NULL  
visualize_mark_v1_density scalar NULL NULL  
visualize_mark_v1_errorband scalar NULL NULL  
visualize_mark_v1_errorbar scalar NULL NULL  
visualize_mark_v1_heatmap scalar NULL NULL  
visualize_mark_v1_histogram scalar NULL NULL  
visualize_mark_v1_line scalar NULL NULL  
visualize_mark_v1_point scalar NULL NULL  
visualize_mark_v1_regression scalar NULL NULL  
visualize_mark_v1_rule scalar NULL NULL  
visualize_mark_v1_text scalar NULL NULL  
visualize_mark_v1_tick scalar NULL NULL  
visualize_mark_v1_violin scalar NULL NULL  
wilcoxon_signed_rank aggregate NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.