Profiling is important to help understand why certain queries exhibit specific performance characteristics. DuckDB contains several built-in features to enable query profiling that will be explained on this page.
EXPLAIN
Statement
The first step to profiling a query can include examining the query plan. The EXPLAIN
statement allows you to peek into the query plan and see what is going on under the hood.
EXPLAIN ANALYZE
Statement
The query plan helps understand the performance characteristics of the system. However, often it is also necessary to look at the performance numbers of individual operators and the cardinalities that pass through them. This is where the EXPLAIN_ANALYZE
statement comes in, which pretty-prints the query plan and also executes the query, providing the actual run-time performance numbers.
Pragmas
DuckDB supports several pragmas that can be used to enable and disable profiling, as well as to control the level of detail in the profiling output.
The following pragmas are available, and can be set using either PRAGMA
or SET
.
They can also be reset using RESET
, followed by the setting name.
For more information on the profiling pragmas and their usage,
see the “Profiling Queries”
section of the pragmas page.
Setting | Description | Default | Options |
---|---|---|---|
enable_profiling , enable_profile |
Turn on profiling | query_tree |
query_tree , json , query_tree_optimizer , no_output |
disable_profiling , disable_profile |
Turn off profiling | ||
profiling_mode |
Toggle additional optimizer, planner, and physical planner metrics | standard |
standard , detailed |
profiling_output |
Set a file to output the profiling to | Console | A path to a file |
custom_profiling_settings |
Enable or disable specific metrics. | All metrics except those activated by detailed profiling. | A JSON object that matches the following: {"METRIC_NAME": "boolean", ...} . See the metrics section below |
Metrics
There are two types of nodes in the query tree: the QUERY_ROOT
, and OPERATOR
nodes. The QUERY_ROOT
refers exclusively to the top level node and the metrics it contains are measured over the entire query. The OPERATOR
nodes refer to the individual operators in the query plan. Some metrics are only available for QUERY_ROOT
nodes, while others are only available for OPERATOR
nodes. The table below describes each metric, as well as which nodes they are available for.
Other than QUERY_NAME
and OPERATOR_TYPE
, all metrics can be turned on or off.
Metric | Return Type | Query | Operator | Description |
---|---|---|---|---|
BLOCKED_THREAD_TIME |
double |
✅ | The total time threads are blocked | |
EXTRA_INFO |
string |
✅ | ✅ | Each operator also has unique metrics, which can be accessed here. |
OPERATOR_CARDINALITY |
uint64 |
✅ | ✅ ️ | The cardinality of each operator, i.e., the number of rows it returns to its parent. |
OPERATOR_ROWS_SCANNED |
uint64 |
✅ | ✅ | The total rows scanned by each operator |
OPERATOR_TIMING |
uint64 |
✅ | ✅ | The time taken by each operator |
OPERATOR_TYPE |
string |
✅ | The name of each operator | |
QUERY_NAME |
string |
✅ | The input query | |
RESULT_SET_SIZE |
uint64 |
✅ | ✅ | The size of the result in bytes |
Cumulative Metrics
DuckDB also supports several cumulative metrics, available in all nodes. In the QUERY_ROOT
node, these metrics represent the sum of the corresponding metrics across all operators in the query. In the OPERATOR
nodes, they represent the sum of the operator's specific metric along with those of all its children recursively.
These cumulative metrics can be enabled independently, even if the underlying specific metrics are disabled. The table below shows the cumulative metrics and the specific metrics they are calculated from.
Metric | Metric Calculated Cumulatively |
---|---|
CPU_TIME |
OPERATOR_TIMING |
CUMULATIVE_CARDINALITY |
OPERATOR_CARDINALITY |
CUMULATIVE_ROWS_SCANNED |
OPERATOR_ROWS_SCANNED |
Detailed Profiling
As explained above, when the profiling_mode
is set to detailed
, an extra set of timing metrics are enabled, which are only available at the QUERY_ROOT
level. These include OPTIMIZERS
, PLANNER
, and PHYSICAL_PLANNER
metrics, which are all measured in milliseconds, and returned as a double
.
These metrics are automatically enabled, and added to the enabled settings, when the profiling_mode
is set to detailed
, however, they can also be toggled individually.
Optimizers
At the QUERY_ROOT
level, there are also metrics that measure the time taken by each optimizer. These metrics are only available when the specific optimizer is enabled. The available optimizations can be queried using the duckdb_optimizers() table function
.
Each optimizer has a corresponding metric that follows the template: OPTIMIZER_<OPTIMIZER_NAME>
. For example, the OPTIMIZER_JOIN_ORDER
metric corresponds to the JOIN_ORDER
optimizer.
Additionally, the following metrics are available to support the optimizer metrics:
ALL_OPTIMIZERS
- Turns on all optimizer metrics, and measures the time taken by the optimizer parent node.CUMMULATIVE_OPTIMIZER_TIMING
- The cumulative sum of all optimizer metrics, can be used without turning on all optimizer metrics.
Planner
The PLANNER
is responsible for generating the logical plan. Currently, two metrics are measured in the PLANNER
:
PLANNER
- The time taken to generate the logical plan from the parsed SQL nodes.PLANNER_BINDING
- The time taken to bind the logical plan.
Physical Planner
The PHYSICAL_PLANNER
is responsible for generating the physical plan from the logical plan.
The following are the metrics supported in the PHYSICAL_PLANNER
:
PHYSICAL_PLANNER
- The time taken to generate the physical plan.PHYSICAL_PLANNER_COLUMN_BINDING
- The time taken to bind the columns in the logical plan to physical columns.PHYSICAL_PLANNER_RESOLVE_TYPES
- The time taken to resolve the types in the logical plan to physical types.PHYSICAL_PLANNER_CREATE_PLAN
- The time taken to create the physical plan.
Setting Custom Metrics Examples
The following examples demonstrate how to enable profiling and set the output format to json
.
In the first example, profiling is enabled, the output is set to a file, and only the EXTRA_INFO
, OPERATOR_CARDINALITY
, and OPERATOR_TIMING
metrics are enabled.
CREATE TABLE students (name VARCHAR, sid INTEGER);
CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER);
INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3);
INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3);
PRAGMA enable_profiling = 'json';
PRAGMA profiling_output = '/path/to/file.json';
PRAGMA custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}';
SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';
The contents of the outputted file:
"operator_timing": 0.000372,
"operator_cardinality": 0,
"extra_info": {},
"query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';",
"children": [
{
"operator_timing": 0.000001,
"operator_cardinality": 2,
"operator_type": "PROJECTION",
"extra_info": {
"Projections": "name",
"Estimated Cardinality": "1"
},
"children": [
{
"operator_timing": 0.000031,
"operator_cardinality": 2,
"operator_type": "HASH_JOIN",
"extra_info": {
"Join Type": "INNER",
"Conditions": "sid = sid",
"Build Min": "1",
"Build Max": "3",
"Estimated Cardinality": "1"
},
"children": [
...
}
The second example adds the detailed metrics to the output.
PRAGMA profiling_mode = 'detailed';
SELECT name
FROM students
JOIN exams USING (sid)
WHERE name LIKE 'Ma%';
The contents of the outputted file:
{
"all_optimizers": 0.00014299999999999998,
"cumulative_optimizer_timing": 0.00014299999999999998,
"planner": 0.000187,
"planner_binding": 0.000185,
"physical_planner": 0.000034,
"physical_planner_column_binding": 0.000002,
"physical_planner_resolve_types": 0.0,
"physical_planner_create_plan": 0.000031,
"optimizer_expression_rewriter": 0.000012,
"optimizer_filter_pullup": 0.000001,
"optimizer_filter_pushdown": 0.000035,
...
"operator_cardinality": 0,
"optimizer_compressed_materialization": 0.0,
"optimizer_deliminator": 0.0,
"extra_info": {},
"query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';",
"children": [
{
"operator_timing": 0.0,
"operator_cardinality": 2,
"operator_type": "PROJECTION",
"extra_info": {
"Projections": "name",
"Estimated Cardinality": "1"
},
"children": [
{
"operator_timing": 0.00010100000000000002,
"operator_cardinality": 2,
"operator_type": "HASH_JOIN",
"extra_info": {
"Join Type": "INNER",
"Conditions": "sid = sid",
"Build Min": "1",
"Build Max": "3",
"Estimated Cardinality": "1"
},
"children": [
...
}
Query Graphs
It is also possible to render the profiling output as a query graph. The query graph is a visual representation of the query plan, showing the operators and their relationships. The query plan must be output in the json
format and stored in a file.
After a profiling output is written to its designated file it can then be rendered as a query graph using the Python script, provided the duckdb
python module is installed. This script will generate an HTML file and open it in your web browser.
python -m duckdb.query_graph /path/to/file.json
Notation in Query Plans
In query plans, the hash join operators adhere to the following convention: the probe side of the join is the left operand, while the build side is the right operand.
Join operators in the query plan show the join type used:
- Inner joins are denoted as
INNER
. - Left outer joins and right outer joins are denoted as
LEFT
andRIGHT
, respectively. - Full outer joins are denoted as
FULL
.