COPY
moves data between DuckDB tables and external Comma Separated Value (CSV) or Parquet files. The Parquet extension must be installed in order to operate on Parquet files, although many clients bundle the Parquet extension by default (Ex: Python, the Command Line Interface/CLI etc.). For more Parquet examples, see the Parquet Files page.
CSV or Parquet Import
COPY ... FROM
imports data into DuckDB from an external CSV or Parquet file into an existing table. The data is appended to whatever data is in the table already. The amount of columns inside the file must match the amount of columns in the table table_name
, and the contents of the columns must be convertible to the column types of the table. In case this is not possible, an error will be thrown.
If a list of columns is specified, COPY
will only copy the data in the specified columns from the file. If there are any columns in the table that are not in the column list, COPY ... FROM
will insert the default values for those columns
-- Copy the contents of a comma-separated file 'test.csv' without a header into the table 'test'
COPY test FROM 'test.csv';
-- Copy the contents of a comma-separated file with a header into the 'category' table
COPY category FROM 'categories.csv' ( HEADER );
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the contents are delimited by a pipe character ('|')
COPY lineitem FROM 'lineitem.tbl' ( DELIMITER '|' );
-- Copy the contents of 'lineitem.tbl' into the 'lineitem' table, where the delimiter, quote character, and presence of a header are automatically detected
COPY lineitem FROM 'lineitem.tbl' ( AUTO_DETECT TRUE );
-- Read the contents of a comma-separated file 'names.csv' into the 'name' column of the 'category' table. Any other columns of this table are filled with their default value.
COPY category(name) FROM 'names.csv';
-- Read the contents of a parquet file 'lineitem.parquet' into the lineitem table
COPY lineitem FROM 'lineitem.parquet' ( FORMAT PARQUET );
Syntax
CSV or Parquet Export
COPY ... TO
exports data from DuckDB to an external CSV or Parquet file. It has mostly the same set of options as COPY ... FROM
, however, in the case of COPY ... TO
the options specify how the file should be written to disk. Any file created by COPY ... TO
can be copied back into the database by using COPY ... FROM
with a similar set of options.
The COPY ... TO
function can be called specifying either a table name, or a query. When a table name is specified, the contents of the entire table will be written into the resulting file. When a query is specified, the query is executed and the result of the query is written to the resulting file.
-- Copy the contents of the 'lineitem' table to the file 'lineitem.tbl', where the columns are delimited by a pipe character ('|'), including a header line.
COPY lineitem TO 'lineitem.tbl' ( DELIMITER '|', HEADER );
-- Copy the l_orderkey column of the 'lineitem' table to the file 'orderkey.tbl'
COPY lineitem(l_orderkey) TO 'orderkey.tbl' ( DELIMITER '|' );
-- Copy the result of a query to the file 'query.csv', including a header with column names
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.csv' WITH (HEADER 1, DELIMITER ',');
-- Copy the result of a query to the Parquet file 'query.parquet'
COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.parquet' (FORMAT PARQUET);
Syntax
Parameters
Name | Description |
---|---|
table_name |
The name (optionally schema-qualified) of an existing table. |
column_name |
An optional list of columns to be copied. If no column list is specified, all columns of the table will be copied. |
filename |
The path and name of the input or output file. |
boolean |
Specifies whether the selected option should be turned on or off. You can write TRUE , ON , or 1 to enable the option, and FALSE , OFF , or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed. |
Copy Options
Zero or more copy options may be provided as a part of the copy operation. The WITH
specifier is optional, but if any options are specified, the parentheses are required. Parameter values can be passed in with or without wrapping in single quotes. See the above note about boolean
options as well.
Name | To / From | Description |
---|---|---|
FORMAT |
To & From | Specifies the copy function to use. This defaults to CSV , but other options can be available (e.g. PARQUET , JSON ). |
DELIMITER |
To & From | Specifies the string that separates columns within each row (line) of the file. The default value is a comma (, ). |
NULL |
To & From | Specifies the string that represents a NULL value. The default is an empty string. Please note that in COPY ... FROM both an unquoted empty string and a quoted empty string represent a NULL value. If any other NULL string is specified, again both its quoted and its unquoted appearance represent a NULL value. COPY ... TO does not quote NULL values on output, even if FORCE_QUOTE is true. |
HEADER |
To & From | Specifies that the file contains a header line with the names of each column in the file. Thus, COPY ... FROM ignores the first line when importing data, whereas on output (COPY ... TO ) the first line of the file contains the column names of the exported columns. |
QUOTE |
To & From | Specifies the quoting string to be used when a data value is quoted. The default is double-quote (" ). |
ESCAPE |
To & From | Specifies the string that should appear before a data character sequence that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting string is doubled if it appears in the data). |
DATEFORMAT |
To & From | Specifies the date format to use when parsing dates. See Date Format |
TIMESTAMPFORMAT |
To & From | Specifies the date format to use when parsing timestamps. See Date Format |
FORCE_QUOTE |
To | Forces quoting to be used for all non-NULL values in each specified column. NULL output is never quoted. If * is specified, non-NULL values will be quoted in all columns. |
FORCE_NOT_NULL |
From | Do not match the specified columns’ values against the NULL string. In the default case where the NULL string is empty, this means that empty values will be read as zero-length strings rather than NULLs. |
ENCODING |
To & From | If this option is used, its value must be UTF8 . With any other encoding an error will be thrown. |
AUTO_DETECT |
From | Option for CSV parsing. If TRUE , the parser will attempt to detect the input format and data types automatically. DELIM /SEP , QUOTE , ESCAPE , and HEADER parameters become optional. |
SAMPLE_SIZE |
From | Option to define number of sample rows for automatic CSV type detection. Chunks of sample rows will be drawn from different locations of the input file. Set to -1 to scan the entire input file. Only the first max. 1024 rows will be used for dialect detection. |
ALL_VARCHAR |
From | Option to skip type detection for CSV parsing and assume all columns to be of type VARCHAR. |
COMPRESSION |
To & From | Option to compress the file. Value can be UNCOMPRESSED , GZIP , SNAPPY , or ZSTD . |
CODEC |
To & From | Alias for compression, but only valid for Parquet files. Option to compress the file. Value can be UNCOMPRESSED , GZIP , SNAPPY , or ZSTD . |
ROW_GROUP_SIZE |
To | Option for Parquet writing. Specifies the minimum number of rows in a parquet row group, with a minimum value equal to DuckDB’s vector size (currently 2048, but adjustable when compiling DuckDB). A parquet row group is a partition of rows, consisting of a column chunk for each column in the dataset. |
It is recommended that the file name used in
COPY
always be specified as an absolute path.The values in each record are separated by the
DELIMITER
string. If the value contains theDELIMITER
string, theQUOTE
string, theNULL
string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by theQUOTE
string, and any occurrence within the value of aQUOTE
string or theESCAPE
string is preceded by theESCAPE
string. You can also useFORCE_QUOTE
to force quotes when outputting non-NULL values in specific columns.The CSV format has no standard way to distinguish a NULL value from an empty string. You can use
FORCE_NOT_NULL
to preventNULL
input comparisons for specific columns.In CSV format, all characters are significant. A quoted value surrounded by white space, or any characters other than
DELIMITER
, will include those characters. This can cause errors if you import data from a system that pads CSV lines with white space out to some fixed width. If such a situation arises you might need to preprocess the CSV file to remove the trailing white space, before importing the data into DuckDB.