The standard way of verifying results of queries is using the query
statement, followed by the letter I
times the number of columns that are expected in the result. After the query, four dashes (----
) are expected followed by the result values separated by tabs. For example,
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20
For legacy reasons the letters R
and T
are also accepted to denote columns.
Deprecated DuckDB deprecated the usage of types in the sqllogictest. The DuckDB test runner does not use or need them internally – therefore, only
I
should be used to denote columns.
NULL Values and Empty Strings
Empty lines have special significance for the SQLLogic test runner: they signify an end of the current statement or query. For that reason, empty strings and NULL values have special syntax that must be used in result verification. NULL values should use the string NULL
, and empty strings should use the string (empty)
, e.g.:
query II
SELECT NULL, ''
----
NULL
(empty)
Error Verification
In order to signify that an error is expected, the statement error
indicator can be used. The statement error
also takes an optional expected result – which is interpreted as the expected error message. Similar to query
, the expected error should be placed after the four dashes (----
) following the query. The test passes if the error message contains the text under statement error
– the entire error message does not need to be provided. It is recommended that you only use a subset of the error message, so that the test does not break unnecessarily if the formatting of error messages is changed.
statement error
SELECT * FROM non_existent_table;
----
Table with name non_existent_table does not exist!
Regex
In certain cases result values might be very large or complex, and we might only be interested in whether or not the result contains a snippet of text. In that case, we can use the <REGEX>:
modifier followed by a certain regex. If the result value matches the regex the test is passed. This is primarily used for query plan analysis.
query II
EXPLAIN SELECT tbl.a FROM "data/parquet-testing/arrow/alltypes_plain.parquet" tbl(a) WHERE a = 1 OR a = 2
----
physical_plan <REGEX>:.*PARQUET_SCAN.*Filters: a=1 OR a=2.*
If we instead want the result not to contain a snippet of text, we can use the <!REGEX>:
modifier.
File
As results can grow quite large, and we might want to re-use results over multiple files, it is also possible to read expected results from files using the <FILE>
command. The expected result is read from the given file. As convention the file path should be provided as relative to the root of the GitHub repository.
query I
PRAGMA tpch(1)
----
<FILE>:extension/tpch/dbgen/answers/sf1/q01.csv
Row-Wise vs. Value-Wise Result Ordering
The result values of a query can be either supplied in row-wise order, with the individual values separated by tabs, or in value-wise order. In value wise order the individual values of the query must appear in row, column order each on an individual line. Consider the following example in both row-wise and value-wise order:
# row-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20
# value-wise
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42
84
10
20
Hashes and Outputting Values
Besides direct result verification, the sqllogic test suite also has the option of using MD5 hashes for value comparisons. A test using hashes for result verification looks like this:
query I
SELECT g, string_agg(x,',') FROM strings GROUP BY g
----
200 values hashing to b8126ea73f21372cdb3f2dc483106a12
This approach is useful for reducing the size of tests when results have many output rows. However, it should be used sparingly, as hash values make the tests more difficult to debug if they do break.
After it is ensured that the system outputs the correct result, hashes of the queries in a test file can be computed by adding mode output_hash
to the test file. For example:
mode output_hash
query II
SELECT 42, 84 UNION ALL SELECT 10, 20;
----
42 84
10 20
The expected output hashes for every query in the test file will then be printed to the terminal, as follows:
================================================================================
SQL Query
SELECT 42, 84 UNION ALL SELECT 10, 20;
================================================================================
4 values hashing to 498c69da8f30c24da3bd5b322a2fd455
================================================================================
In a similar manner, mode output_result
can be used in order to force the program to print the result to the terminal for every query run in the test file.
Result Sorting
Queries can have an optional field that indicates that the result should be sorted in a specific manner. This field goes in the same location as the connection label. Because of that, connection labels and result sorting cannot be mixed.
The possible values of this field are nosort
, rowsort
and valuesort
. An example of how this might be used is given below:
query I rowsort
SELECT 'world' UNION ALL SELECT 'hello'
----
hello
world
In general, we prefer not to use this field and rely on ORDER BY
in the query to generate deterministic query answers. However, existing sqllogictests use this field extensively, hence it is important to know of its existence.
Query Labels
Another feature that can be used for result verification are query labels
. These can be used to verify that different queries provide the same result. This is useful for comparing queries that are logically equivalent, but formulated differently. Query labels are provided after the connection label or sorting specifier.
Queries that have a query label do not need to have a result provided. Instead, the results of each of the queries with the same label are compared to each other. For example, the following script verifies that the queries SELECT 42+1
and SELECT 44-1
provide the same result:
query I nosort r43
SELECT 42+1;
----
query I nosort r43
SELECT 44-1;
----