Relational API

Search Shortcut cmd + k | ctrl + k

Documentation / Client APIs / Python

Relational API

The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around DuckDBPyRelation nodes. The relations can be seen as symbolic representations of SQL queries.

Lazy Evaluation

The relations do not hold any data – and nothing is executed – until a method that triggers execution is called.

For example, we create a relation, which loads 1 billion rows:

import duckdb

duckdb_conn = duckdb.connect()

rel = duckdb_conn.sql("from range(1_000_000_000)")

At the moment of execution, rel does not hold any data and no data is retrieved from the database.

By calling rel.show() or simply printing rel on the terminal, the first 10K rows are fetched. If there are more than 10K rows, the output window will show >9999 rows (as the amount of rows in the relation is unknown).

By calling an output method, the data is retrieved and stored in the specified format:

rel.to_table("example_rel")

# 100% ▕████████████████████████████████████████████████████████████▏ 

Relation Creation

This section contains the details on how a relation is created. The methods are lazy evaluated.

Name	Description
`from_arrow`	Create a relation object from an Arrow object
`from_csv_auto`	Create a relation object from the CSV file in 'name'
`from_df`	Create a relation object from the DataFrame in df
`from_parquet`	Create a relation object from the Parquet files
`from_query`	Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
`query`	Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
`read_csv`	Create a relation object from the CSV file in 'name'
`read_json`	Create a relation object from the JSON file in 'name'
`read_parquet`	Create a relation object from the Parquet files
`sql`	Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
`table`	Create a relation object for the named table
`table_function`	Create a relation object from the named table function with given parameters
`values`	Create a relation object from the passed values
`view`	Create a relation object for the named view

`from_arrow`

Signature

from_arrow(self: duckdb.duckdb.DuckDBPyConnection, arrow_object: object) -> duckdb.duckdb.DuckDBPyRelation

Name	Description
`alias`	Get the name of the current alias
`columns`	Return a list containing the names of the columns of the relation.
`describe`	Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation.
`description`	Return the description of the result
`dtypes`	Return a list containing the types of the columns of the relation.
`explain`	explain(self: duckdb.duckdb.DuckDBPyRelation, type: duckdb.duckdb.ExplainType = 'standard') -> str
`query`	Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object
`set_alias`	Rename the relation object to new alias
`shape`	Tuple of # of rows, # of columns in relation.
`show`	Display a summary of the data
`sql_query`	Get the SQL query that is equivalent to the relation
`type`	Get the type of the relation.
`types`	Return a list containing the types of the columns of the relation.

Name	Description
`aggregate`	Compute the aggregate aggr_expr by the optional groups group_expr on the relation
`apply`	Compute the function of a single column or a list of columns by the optional groups on the relation
`cross`	Create cross/cartesian product of two relational objects
`except_`	Create the set except of this relation object with another relation object in other_rel
`filter`	Filter the relation object by the filter in filter_expr
`insert`	Inserts the given values into the relation
`insert_into`	Inserts the relation object into an existing table named table_name
`intersect`	Create the set intersection of this relation object with another relation object in other_rel
`join`	Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti'
`limit`	Only retrieve the first n rows from this relation object, starting at offset
`map`	Calls the passed function on the relation
`order`	Reorder the relation object by order_expr
`project`	Project the relation object by the projection in project_expr
`select`	Project the relation object by the projection in project_expr
`sort`	Reorder the relation object by the provided expressions
`union`	Create the set union of this relation object with another relation object in other_rel
`update`	Update the given relation with the provided expressions

Name	Description
`any_value`	Returns the first non-null value from a given column
`arg_max`	Finds the row with the maximum value for a value column and returns the value of that row for an argument column
`arg_min`	Finds the row with the minimum value for a value column and returns the value of that row for an argument column
`avg`	Computes the average on a given column
`bit_and`	Computes the bitwise AND of all bits present in a given column
`bit_or`	Computes the bitwise OR of all bits present in a given column
`bit_xor`	Computes the bitwise XOR of all bits present in a given column
`bitstring_agg`	Computes a bitstring with bits set for each distinct value in a given column
`bool_and`	Computes the logical AND of all values present in a given column
`bool_or`	Computes the logical OR of all values present in a given column
`count`	Computes the number of elements present in a given column
`cume_dist`	Computes the cumulative distribution within the partition
`dense_rank`	Computes the dense rank within the partition
`distinct`	Retrieve distinct rows from this relation object
`favg`	Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum)
`first`	Returns the first value of a given column
`first_value`	Computes the first value within the group or partition
`fsum`	Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum)
`geomean`	Computes the geometric mean over all values present in a given column
`histogram`	Computes the histogram over all values present in a given column
`lag`	Computes the lag within the partition
`last`	Returns the last value of a given column
`last_value`	Computes the last value within the group or partition
`lead`	Computes the lead within the partition
`list`	Returns a list containing all values present in a given column
`max`	Returns the maximum value present in a given column
`mean`	Computes the average on a given column
`median`	Computes the median over all values present in a given column
`min`	Returns the minimum value present in a given column
`mode`	Computes the mode over all values present in a given column
`n_tile`	Divides the partition as equally as possible into num_buckets
`nth_value`	Computes the nth value within the partition
`percent_rank`	Computes the relative rank within the partition
`product`	Returns the product of all values present in a given column
`quantile`	Computes the exact quantile value for a given column
`quantile_cont`	Computes the interpolated quantile value for a given column
`quantile_disc`	Computes the exact quantile value for a given column
`rank`	Computes the rank within the partition
`rank_dense`	Computes the dense rank within the partition
`row_number`	Computes the row number within the partition
`select_dtypes`	Select columns from the relation, by filtering based on type(s)
`select_types`	Select columns from the relation, by filtering based on type(s)
`std`	Computes the sample standard deviation for a given column
`stddev`	Computes the sample standard deviation for a given column
`stddev_pop`	Computes the population standard deviation for a given column
`stddev_samp`	Computes the sample standard deviation for a given column
`string_agg`	Concatenates the values present in a given column with a separator
`sum`	Computes the sum of all values present in a given column
`unique`	Returns the distinct values in a column.
`value_counts`	Computes the number of elements present in a given column, also projecting the original column
`var`	Computes the sample variance for a given column
`var_pop`	Computes the population variance for a given column
`var_samp`	Computes the sample variance for a given column
`variance`	Computes the sample variance for a given column

Name	Description
`arrow`	Execute and fetch all rows as an Arrow Table
`close`	Closes the result
`create`	Creates a new table named table_name with the contents of the relation object
`create_view`	Creates a view named view_name that refers to the relation object
`df`	Execute and fetch all rows as a pandas DataFrame
`execute`	Transform the relation into a result set
`fetch_arrow_reader`	Execute and return an Arrow Record Batch Reader that yields all rows
`fetch_arrow_table`	Execute and fetch all rows as an Arrow Table
`fetch_df_chunk`	Execute and fetch a chunk of the rows
`fetchall`	Execute and fetch all rows as a list of tuples
`fetchdf`	Execute and fetch all rows as a pandas DataFrame
`fetchmany`	Execute and fetch the next set of rows as a list of tuples
`fetchnumpy`	Execute and fetch all rows as a Python dict mapping each column to one numpy arrays
`fetchone`	Execute and fetch a single row as a tuple
`pl`	Execute and fetch all rows as a Polars DataFrame
`record_batch`	Execute and return an Arrow Record Batch Reader that yields all rows
`tf`	Fetch a result as dict of TensorFlow Tensors
`to_arrow_table`	Execute and fetch all rows as an Arrow Table
`to_csv`	Write the relation object to a CSV file in 'file_name'
`to_df`	Execute and fetch all rows as a pandas DataFrame
`to_parquet`	Write the relation object to a Parquet file in 'file_name'
`to_table`	Creates a new table named table_name with the contents of the relation object
`to_view`	Creates a view named view_name that refers to the relation object
`torch`	Fetch a result as dict of PyTorch Tensors
`write_csv`	Write the relation object to a CSV file in 'file_name'
`write_parquet`	Write the relation object to a Parquet file in 'file_name'