A statistical computing toolkit for DuckDB — descriptive statistics, hypothesis tests, R-style distribution functions (d/p/q/r), OLS regression with HC and cluster-robust standard errors, grammar-of-graphics charts (VISUALIZE → Vega-Lite), and first-class readers/writers for SAS, SPSS, and Stata files, all directly in SQL.
Installing and Loading
INSTALL stats_duck FROM community;
LOAD stats_duck;
Example
-- R-style distribution functions (d/p/q/r for 15+ families):
D SELECT pnorm(1.96) AS p, qt(0.975, 30) AS t_crit, dpois(3, 2.5) AS pmf;
┌────────────────────┬────────────────────┬─────────────────────┐
│ p │ t_crit │ pmf │
│ double │ double │ double │
├────────────────────┼────────────────────┼─────────────────────┤
│ 0.9750021048517796 │ 2.0422724563012373 │ 0.21376301724973648 │
└────────────────────┴────────────────────┴─────────────────────┘
-- OLS regression as an aggregate (one model per GROUP BY), with robust SEs.
-- vcov: 'const' | 'HC0'..'HC3' | 'CR0'/'CR1' (cluster-robust, + a cluster key).
D SELECT (c).term, (c).estimate, (c).std_error
FROM (
SELECT unnest((lm_fit(y, [x1, x2], 'HC1')).coefficients) AS c
FROM my_data
);
-- A clinical "Table 1" over mixed variable types, with between-group tests:
D SELECT * FROM table_one('patients', variables := ['age', 'sex', 'bmi'], by := ['arm']);
-- Grammar-of-graphics: turn a query into a Vega-Lite spec (+ the SQL to feed it):
D VISUALIZE wt AS x, mpg AS y FROM cars DRAW point;
About stats_duck
The Stats Duck
A statistical computing toolkit that brings common statistical workflows into DuckDB SQL. Everything runs as streaming aggregates and scalar functions, so it scales to large datasets — and also runs in DuckDB-WASM in the browser.
What's included
- Descriptive statistics —
summary_stats,table_one(the clinical "Table 1"),corr_matrix,meta. - Hypothesis tests — one/two-sample and paired t-tests, one-way ANOVA, chi-square, Mann–Whitney, Wilcoxon signed-rank, Kolmogorov–Smirnov, Shapiro–Wilk, Anderson–Darling, Jarque–Bera, and correlation tests (Pearson / Spearman / Kendall).
- Distribution functions — R-style
d/p/q/rquartets for 15+ families (normal, t, chi-square, F, gamma, beta, exponential, Weibull, log-normal, Poisson, negative binomial, hypergeometric, …). - Regression —
lm/lm_summarytable functions and thelm_fitaggregate (one model perGROUP BY) with classical, HC0–HC3 heteroskedasticity-consistent, and CR0/CR1 cluster-robust standard errors. - Grammar of graphics —
VISUALIZE … DRAW <mark>returns a Vega-Lite v5 spec plus the SQL needed to feed it. - Statistical file formats — first-class readers and writers for SAS
(
.sas7bdat,.xpt), SPSS (.sav,.por), and Stata files, via ReadStat. - Resampling & correction —
bootstrap,adjust_p(multiple-testing).
Full function reference: https://github.com/KoliStat/the-stats-duck.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| adjust_p | scalar | NULL | NULL | |
| anderson_darling | aggregate | NULL | NULL | |
| anova_oneway | aggregate | NULL | NULL | |
| bin_edges | aggregate | NULL | NULL | |
| bin_label | scalar | NULL | NULL | |
| bootstrap | aggregate | NULL | NULL | |
| chisq_goodness_of_fit | aggregate | NULL | NULL | |
| chisq_independence | aggregate | NULL | NULL | |
| corr_matrix | table | NULL | NULL | |
| dbeta | scalar | NULL | NULL | |
| dchisq | scalar | NULL | NULL | |
| dexp | scalar | NULL | NULL | |
| df | scalar | NULL | NULL | |
| dgamma | scalar | NULL | NULL | |
| dhyper | scalar | NULL | NULL | |
| dlnorm | scalar | NULL | NULL | |
| dnbinom | scalar | NULL | NULL | |
| dnorm | scalar | NULL | NULL | |
| dpois | scalar | NULL | NULL | |
| dt | scalar | NULL | NULL | |
| dweibull | scalar | NULL | NULL | |
| jarque_bera | aggregate | NULL | NULL | |
| kendall_test | aggregate | NULL | NULL | |
| ks_test_1samp | aggregate | NULL | NULL | |
| ks_test_2samp | aggregate | NULL | NULL | |
| lm | table | NULL | NULL | |
| lm_fit | aggregate | NULL | NULL | |
| lm_summary | table | NULL | NULL | |
| mann_whitney_u | aggregate | NULL | NULL | |
| meta | table | NULL | NULL | |
| pbeta | scalar | NULL | NULL | |
| pchisq | scalar | NULL | NULL | |
| pearson_test | aggregate | NULL | NULL | |
| pexp | scalar | NULL | NULL | |
| pf | scalar | NULL | NULL | |
| pgamma | scalar | NULL | NULL | |
| phyper | scalar | NULL | NULL | |
| plnorm | scalar | NULL | NULL | |
| pnbinom | scalar | NULL | NULL | |
| pnorm | scalar | NULL | NULL | |
| poibin_cdf | scalar | NULL | NULL | |
| ppois | scalar | NULL | NULL | |
| pt | scalar | NULL | NULL | |
| pweibull | scalar | NULL | NULL | |
| qbeta | scalar | NULL | NULL | |
| qchisq | scalar | NULL | NULL | |
| qexp | scalar | NULL | NULL | |
| qf | scalar | NULL | NULL | |
| qgamma | scalar | NULL | NULL | |
| qhyper | scalar | NULL | NULL | |
| qlnorm | scalar | NULL | NULL | |
| qnbinom | scalar | NULL | NULL | |
| qnorm | scalar | NULL | NULL | |
| qpois | scalar | NULL | NULL | |
| qt | scalar | NULL | NULL | |
| qweibull | scalar | NULL | NULL | |
| rbeta | scalar | NULL | NULL | |
| rchisq | scalar | NULL | NULL | |
| read_stat | table | NULL | NULL | |
| read_stat_metadata | table | NULL | NULL | |
| rexp | scalar | NULL | NULL | |
| rf | scalar | NULL | NULL | |
| rgamma | scalar | NULL | NULL | |
| rhyper | scalar | NULL | NULL | |
| rlnorm | scalar | NULL | NULL | |
| rnbinom | scalar | NULL | NULL | |
| rnorm | scalar | NULL | NULL | |
| rpois | scalar | NULL | NULL | |
| rt | scalar | NULL | NULL | |
| rweibull | scalar | NULL | NULL | |
| shapiro_wilk | aggregate | NULL | NULL | |
| sign_test_1samp | aggregate | NULL | NULL | |
| sign_test_paired | aggregate | NULL | NULL | |
| spearman_test | aggregate | NULL | NULL | |
| summary_stats | aggregate | NULL | NULL | |
| table_one | table | NULL | NULL | |
| ttest_1samp | aggregate | NULL | NULL | |
| ttest_2samp | aggregate | NULL | NULL | |
| ttest_paired | aggregate | NULL | NULL | |
| visualize_mark_v1_area | scalar | NULL | NULL | |
| visualize_mark_v1_bar | scalar | NULL | NULL | |
| visualize_mark_v1_boxplot | scalar | NULL | NULL | |
| visualize_mark_v1_density | scalar | NULL | NULL | |
| visualize_mark_v1_errorband | scalar | NULL | NULL | |
| visualize_mark_v1_errorbar | scalar | NULL | NULL | |
| visualize_mark_v1_heatmap | scalar | NULL | NULL | |
| visualize_mark_v1_histogram | scalar | NULL | NULL | |
| visualize_mark_v1_line | scalar | NULL | NULL | |
| visualize_mark_v1_point | scalar | NULL | NULL | |
| visualize_mark_v1_regression | scalar | NULL | NULL | |
| visualize_mark_v1_rule | scalar | NULL | NULL | |
| visualize_mark_v1_text | scalar | NULL | NULL | |
| visualize_mark_v1_tick | scalar | NULL | NULL | |
| visualize_mark_v1_violin | scalar | NULL | NULL | |
| wilcoxon_signed_rank | aggregate | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.