⌘+k ctrl+k
1.3 (stable)
Search Shortcut cmd + k | ctrl + k
Summarize

The SUMMARIZE command can be used to easily compute a number of aggregates over a table or a query. The SUMMARIZE command launches a query that computes a number of aggregates over all columns (min, max, approx_unique, avg, std, q25, q50, q75, count), and return these along the column name, column type, and the percentage of NULL values in the column. Note that the quantiles and percentiles are approximate values.

Usage

In order to summarize the contents of a table, use SUMMARIZE followed by the table name.

SUMMARIZE tbl;

In order to summarize a query, prepend SUMMARIZE to a query.

SUMMARIZE SELECT * FROM tbl;

Example

Below is an example of SUMMARIZE on the lineitem table of TPC-H SF1 table, generated using the tpch extension.

INSTALL tpch;
LOAD tpch;
CALL dbgen(sf = 1);
SUMMARIZE lineitem;
column_name column_type min max approx_unique avg std q25 q50 q75 count null_percentage
l_orderkey INTEGER 1 6000000 1508227 3000279.604204982 1732187.8734803519 1509447 2989869 4485232 6001215 0.0%
l_partkey INTEGER 1 200000 202598 100017.98932999402 57735.69082650496 49913 99992 150039 6001215 0.0%
l_suppkey INTEGER 1 10000 10061 5000.602606138924 2886.9619987306114 2501 4999 7500 6001215 0.0%
l_linenumber INTEGER 1 7 7 3.0005757167506912 1.7324314036519328 2 3 4 6001215 0.0%
l_quantity DECIMAL(15,2) 1.00 50.00 50 25.507967136654827 14.426262537016918 13 26 38 6001215 0.0%
l_extendedprice DECIMAL(15,2) 901.00 104949.50 923139 38255.138484656854 23300.43871096221 18756 36724 55159 6001215 0.0%
l_discount DECIMAL(15,2) 0.00 0.10 11 0.04999943011540163 0.03161985510812596 0 0 0 6001215 0.0%
l_tax DECIMAL(15,2) 0.00 0.08 9 0.04001350893110812 0.025816551798842728 0 0 0 6001215 0.0%
l_returnflag VARCHAR A R 3 NULL NULL NULL NULL NULL 6001215 0.0%
l_linestatus VARCHAR F O 2 NULL NULL NULL NULL NULL 6001215 0.0%
l_shipdate DATE 1992-01-02 1998-12-01 2516 NULL NULL NULL NULL NULL 6001215 0.0%
l_commitdate DATE 1992-01-31 1998-10-31 2460 NULL NULL NULL NULL NULL 6001215 0.0%
l_receiptdate DATE 1992-01-04 1998-12-31 2549 NULL NULL NULL NULL NULL 6001215 0.0%
l_shipinstruct VARCHAR COLLECT COD TAKE BACK RETURN 4 NULL NULL NULL NULL NULL 6001215 0.0%
l_shipmode VARCHAR AIR TRUCK 7 NULL NULL NULL NULL NULL 6001215 0.0%
l_comment VARCHAR Tiresias zzle? furiously iro 3558599 NULL NULL NULL NULL NULL 6001215 0.0%

Using SUMMARIZE in a Subquery

SUMMARIZE can be used a subquery. This allows creating a table from the summary, for example:

CREATE TABLE tbl_summary AS SELECT * FROM (SUMMARIZE tbl);

Summarizing Remote Tables

It is possible to summarize remote tables via the httpfs extension using the SUMMARIZE TABLE statement. For example:

SUMMARIZE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv';