Search Shortcut cmd + k | ctrl + k
Search cmd+k ctrl+k
0.10 (stable)
S3 Parquet Import

Prerequisites

To load a Parquet file from S3, the httpfs extension is required. This can be installed use the INSTALL SQL command. This only needs to be run once.

INSTALL httpfs;

To load the httpfs extension for usage, use the LOAD SQL command:

LOAD httpfs;

Credentials and Configuration

After loading the httpfs extension, set up the credentials and S3 region to read data:

CREATE SECRET (
    TYPE S3,
    KEY_ID 'AKIAIOSFODNN7EXAMPLE',
    SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    REGION 'us-east-1'
);

Tip If you get an IO Error (Connection error for HTTP HEAD), configure the endpoint explicitly via ENDPOINT 's3.⟨your-region⟩.amazonaws.com'.

Alternatively, use the aws extension to retrieve the credentials automatically:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN
);

Querying

After the httpfs extension is set up and the S3 configuration is set correctly, Parquet files can be read from S3 using the following command:

SELECT * FROM read_parquet('s3://⟨bucket⟩/⟨file⟩');

Google Cloud Storage (GCS) and Cloudflare R2

DuckDB can also handle Google Cloud Storage (GCS) and Cloudflare R2 via the S3 API. See the relevant guides for details.

About this page

Last modified: 2024-05-22