FROM clause specifies the source of the data on which the remainder of the query should operate. Logically, the
FROM clause is where the query starts execution. The
FROM clause can contain a single table, a combination of multiple tables that are joined together using
JOIN clauses, or another
SELECT query inside a subquery node. DuckDB also has an optional
FROM-first syntax which enables you to also query without a
-- select all columns from the table called "table_name" SELECT * FROM table_name; -- select all columns from the table called "table_name" using the FROM-first syntax FROM table_name SELECT *; -- select all columns using the FROM-first syntax and omitting the SELECT clause FROM table_name; -- select all columns from the table called "table_name" in the schema "schema_name SELECT * FROM schema_name.table_name; -- select the column "i" from the table function "range", where the first column of the range function is renamed to "i" SELECT t.i FROM range(100) AS t(i); -- select all columns from the CSV file called "test.csv" SELECT * FROM 'test.csv'; -- select all columns from a subquery SELECT * FROM (SELECT * FROM table_name); -- select the entire row of the table as a struct SELECT t FROM t; -- select the entire row of the subquery as a struct (i.e., a single column) SELECT t FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t; -- join two tables together SELECT * FROM table_name JOIN other_table ON (table_name.key = other_table.key); -- select a 10% sample from a table SELECT * FROM table_name TABLESAMPLE 10%; -- select a sample of 10 rows from a table SELECT * FROM table_name TABLESAMPLE 10 ROWS; -- use the FROM-first syntax with WHERE clause and aggregation FROM range(100) AS t(i) SELECT sum(t.i) WHERE i % 2 = 0;
Joins are a fundamental relational operation used to connect two tables or relations horizontally. The relations are referred to as the left and right sides of the join based on how they are written in the join clause. Each result row has the columns from both relations.
A join uses a rule to match pairs of rows from each relation. Often this is a predicate, but there are other implied rules that may be specified.
Rows that do not have any matches can still be returned if an
OUTER join is specified.
Outer joins can be one of:
LEFT(All rows from the left relation appear at least once)
RIGHT(All rows from the right relation appear at least once)
FULL(All rows from both relations appear at least once)
A join that is not
INNER (only rows that get paired are returned).
When an unpaired row is returned, the attributes from the other table are set to
The simplest type of join is a
There are no conditions for this type of join,
and it just returns all the possible pairs.
-- return all pairs of rows SELECT a.*, b.* FROM a CROSS JOIN b;
Most joins are specified by a predicate that connects
attributes from one side to attributes from the other side.
The conditions can be explicitly specified using an
with the join (clearer) or implied by the
WHERE clause (old-fashioned).
We use the
l_regions and the
l_nations tables from the TPC-H schema:
CREATE TABLE l_regions(r_regionkey INTEGER NOT NULL PRIMARY KEY, r_name CHAR(25) NOT NULL, r_comment VARCHAR(152)); CREATE TABLE l_nations (n_nationkey INTEGER NOT NULL PRIMARY KEY, n_name CHAR(25) NOT NULL, n_regionkey INTEGER NOT NULL, n_comment VARCHAR(152), FOREIGN KEY (n_regionkey) REFERENCES l_regions(r_regionkey));
-- return the regions for the nations SELECT n.*, r.* FROM l_nations n JOIN l_regions r ON (n_regionkey = r_regionkey);
If the column names are the same and are required to be equal,
then the simpler
USING syntax can be used:
CREATE TABLE l_regions(regionkey INTEGER NOT NULL PRIMARY KEY, name CHAR(25) NOT NULL, comment VARCHAR(152)); CREATE TABLE l_nations (nationkey INTEGER NOT NULL PRIMARY KEY, name CHAR(25) NOT NULL, regionkey INTEGER NOT NULL, comment VARCHAR(152), FOREIGN KEY (regionkey) REFERENCES l_regions(regionkey));
-- return the regions for the nations SELECT n.*, r.* FROM l_nations n JOIN l_regions r USING (regionkey);
The expressions to not have to be equalities - any predicate can be used:
-- return the pairs of jobs where one ran longer but cost less SELECT s1.t_id, s2.t_id FROM west s1, west s2 WHERE s1.time > s2.time AND s1.cost < s2.cost;
Semi joins return rows from the left table that have at least one match in the right table. Anti joins return rows from the left table that have no matches in the right table. When using a semi or anti join the result will never have more rows than the left hand side table. Semi and anti joins provide the same logic as (NOT) IN statements.
-- return a list of cars that have a valid region. SELECT cars.name, cars.manufacturer FROM cars SEMI JOIN region ON cars.region = region.id;
-- return a list of cars with no recorded safety data. SELECT cars.name, cars.manufacturer FROM cars ANTI JOIN safety_data ON cars.safety_report_id = safety_data.report_id;
LATERAL keyword allows subqueries in the
FROM clause to refer to previous subqueries. This feature is also known as a lateral join.
SELECT * FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j);
┌───────┬───────┐ │ i │ j │ │ int64 │ int64 │ ├───────┼───────┤ │ 0 │ 1 │ │ 1 │ 2 │ │ 2 │ 3 │ └───────┴───────┘
Lateral joins are a generalization of correlated subqueries, as they can return multiple values per input value rather than only a single value.
SELECT * FROM generate_series(0, 1) t(i), LATERAL (SELECT i + 10 UNION ALL SELECT i + 100) t2(j);
┌───────┬───────┐ │ i │ j │ │ int64 │ int64 │ ├───────┼───────┤ │ 0 │ 10 │ │ 1 │ 11 │ │ 0 │ 100 │ │ 1 │ 101 │ └───────┴───────┘
It may be helpful to think about
LATERAL as a loop where we iterate through the rows of the first subquery and use it as input to the second (
In the examples above, we iterate through table
t and refer to its column
i from the definition of table
t2. The rows of
t2 form column
j in the result.
It is possible to refer to multiple attributes from the
LATERAL subquery. Using the table from the first example:
CREATE TABLE t1 AS SELECT * FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j); SELECT * FROM t1, LATERAL (SELECT i + j) t2(k);
┌───────┬───────┬───────┐ │ i │ j │ k │ │ int64 │ int64 │ int64 │ ├───────┼───────┼───────┤ │ 0 │ 1 │ 1 │ │ 1 │ 2 │ 3 │ │ 2 │ 3 │ 5 │ └───────┴───────┴───────┘
DuckDB detects when
LATERALjoins should be used, making the use of the
When working with data frames or other embedded tables of the same size, the rows may have a natural correspondence based on their physical order. In scripting languages, this is easily expressed using a loop:
for (i=0;i<n;i++) f(t1.a[i], t2.b[i])
It is difficult to express this in standard SQL because relational tables are not ordered, but imported tables (like data frames) or disk files (like CSVs or Parquet files) do have a natural ordering.
Connecting them using this ordering is called a positional join:
-- treat two data frames as a single table SELECT df1.*, df2.* FROM df1 POSITIONAL JOIN df2;
Positional joins are always
FULL OUTER joins.
A common operation when working with temporal or similarly-ordered data is to find the nearest (first) event in a reference table (such as prices). This is called an as-of join:
-- attach prices to stock trades SELECT t.*, p.price FROM trades t ASOF JOIN prices p ON t.symbol = p.symbol AND t.when >= p.when;
ASOF join requires at least one inequality condition on the ordering field.
The inequality can be any inequality condition (
on any data type, but the most common form is
>= on a temporal type.
Any other conditions must be equalities (or
This means that the left/right order of the tables is significant.
ASOF joins each left side row with at most one right side row.
It can be specified as an
OUTER join to find unpaired rows
(e.g., trades without prices or prices which have no trades.)
-- attach prices or NULLs to stock trades SELECT * FROM trades t ASOF LEFT JOIN prices p ON t.symbol = p.symbol AND t.when >= p.when;
ASOF joins can also specify join conditions on matching column names with the
but the last attribute in the list must be the inequality,
which will be greater than or equal to (
SELECT * FROM trades t ASOF JOIN prices p USING (symbol, when); -- Returns symbol, trades.when, price (but NOT prices.when)
If you combine
USING with a
SELECT * like this,
the query will return the left side (probe) column values for the matches,
not the right side (build) column values.
To get the
prices times in the example, you will need to list the columns explicitly:
SELECT t.symbol, t.when AS trade_when, p.when AS price_when, price FROM trades t ASOF LEFT JOIN prices p USING (symbol, when);