Parquet Import

Search Shortcut cmd + k | ctrl + k

Documentation / Guides / File Formats

Parquet Import

To read data from a Parquet file, use the read_parquet function in the FROM clause of a query:

SELECT * FROM read_parquet('input.parquet');

Alternatively, you can omit the read_parquet function and let DuckDB infer it from the extension:

SELECT * FROM 'input.parquet';

To create a new table using the result from a query, use CREATE TABLE ... AS SELECT statement:

CREATE TABLE new_tbl AS
    SELECT * FROM read_parquet('input.parquet');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement:

INSERT INTO tbl
    SELECT * FROM read_parquet('input.parquet');

Alternatively, the COPY statement can also be used to load data from a Parquet file into an existing table:

COPY tbl FROM 'input.parquet' (FORMAT parquet);

Adjusting the Schema on the Fly

You can load a Parquet file into a slightly different schema (e.g., different number of columns, more relaxed types) using the following trick.

Suppose we have a Parquet file with two columns, c1 and c2:

COPY (FROM (VALUES (42, 43)) t(c1, c2))
TO 'f.parquet';

If we want to add another column c3 that is not present in the file, we can run:

FROM (VALUES(NULL::VARCHAR, NULL, NULL)) t(c1, c2, c3)
WHERE false
UNION ALL BY NAME
FROM 'f.parquet';

The first FROM clause generates an empty tables with three columns where c1 is a VARCHAR. Then, we use UNION ALL BY NAME to union the Parquet file. The result here is:

┌─────────┬───────┬───────┐
│   c1    │  c2   │  c3   │
│ varchar │ int32 │ int32 │
├─────────┼───────┼───────┤
│ 42      │  43   │ NULL  │
└─────────┴───────┴───────┘

For additional options, see the Parquet loading reference.

About this page

Code of Conduct Trademark Use

Adjusting the Schema on the Fly

About this page

In this article