The standard way of verifying results of queries is using the
query statement, followed by the letter
I times the number of columns that are expected in the result. After the query, four dashes (
----) are expected followed by the result values separated by tabs. For example,
query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20
For legacy reasons the letters
T are also accepted to denote columns.
NULL Values and empty strings
Empty lines have special significance for the SQLLogic test runner: they signify an end of the current statement or query. For that reason, empty strings and NULL values have special syntax that must be used in result verification. NULL values should use the string
NULL, and empty strings should use the string
query II SELECT NULL, '' ---- NULL (empty)
In order to signify that an error is expected, the
statement error indicator can be used. The
statement error also takes an optional expected result - which is interpreted as the expected error message. Similar to
query, the expected error should be placed after the four dashes (
----) following the query. The test passes if the error message contains the text under
statement error - the entire error message does not need to be provided. It is recommended that you only use a subset of the error message, so that the test does not break unnecessarily if the formatting of error messages is changed.
statement error SELECT * FROM non_existent_table; ---- Table with name non_existent_table does not exist!
In certain cases result values might be very large or complex, and we might only be interested in whether or not the result contains a snippet of text. In that case, we can use the
<REGEX>: modifier followed by a certain regex. If the result value matches the regex the test is passed. This is primarily used for query plan analysis.
query II EXPLAIN SELECT tbl.a FROM "data/parquet-testing/arrow/alltypes_plain.parquet" tbl(a) WHERE a=1 OR a=2 ---- physical_plan <REGEX>:.*PARQUET_SCAN.*Filters: a=1 OR a=2.*
If we instead want the result not to contain a snippet of text, we can use the
As results can grow quite large, and we might want to re-use results over multiple files, it is also possible to read expected results from files using the
<FILE> command. The expected result is read from the given file. As convention the file path should be provided as relative to the root of the Github repository.
query I PRAGMA tpch(1) ---- <FILE>:extension/tpch/dbgen/answers/sf1/q01.csv
Row-Wise vs Value-Wise Result Ordering
The result values of a query can be either supplied in row-wise order, with the individual values separated by tabs, or in value-wise order. In value wise order the individual values of the query must appear in row, column order each on an individual line. Consider the following example in both row-wise and value-wise order:
# row-wise query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20 # value-wise query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20
Hashes & Outputting Values
Besides direct result verification, the sqllogic test suite also has the option of using MD5 hashes for value comparisons. A test using hashes for result verification looks like this:
query I SELECT g, STRING_AGG(x,',') FROM strings GROUP BY g ---- 200 values hashing to b8126ea73f21372cdb3f2dc483106a12
This approach is useful for reducing the size of tests when results have many output rows. However, it should be used sparingly, as hash values make the tests more difficult to debug if they do break.
After it is ensured that the system outputs the correct result, hashes of the queries in a test file can be computed by adding
mode output_hash to the test file. For example:
mode output_hash query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20
The expected output hashes for every query in the test file will then be printed to the terminal, as follows:
================================================================================ SQL Query SELECT 42, 84 UNION ALL SELECT 10, 20; ================================================================================ 4 values hashing to 498c69da8f30c24da3bd5b322a2fd455 ================================================================================
In a similar manner,
mode output_result can be used in order to force the program to print the result to the terminal for every query run in the test file.
Queries can have an optional field that indicates that the result should be sorted in a specific manner. This field goes in the same location as the connection label. Because of that, connection labels and result sorting cannot be mixed.
The possible values of this field are
valuesort. An example of how this might be used is given below:
query I rowsort SELECT 'world' UNION ALL SELECT 'hello' ---- hello world
In general, we prefer not to use this field and rely on
ORDER BY in the query to generate deterministic query answers. However, existing sqllogictests use this field extensively, hence it is important to know of its existance.
Another feature that can be used for result verification are
query labels. These can be used to verify that different queries provide the same result. This is useful for comparing queries that are logically equivalent, but formulated differently. Query labels are provided after the connection label or sorting specifier.
Queries that have a query label do not need to have a result provided. Instead, the results of each of the queries with the same label are compared to each other. For example, the following script verifies that the queries
SELECT 42+1 and
SELECT 44-1 provide the same result:
query I nosort r43 SELECT 42+1; ---- query I nosort r43 SELECT 44-1; ----