⌘+k ctrl+k
1.1.3 (stable)
Search Shortcut cmd + k | ctrl + k
S3 Express One

In late 2023, AWS announced the S3 Express One Zone, a high-speed variant of traditional S3 buckets. DuckDB can read S3 Express One buckets using the httpfs extension.

Credentials and Configuration

The configuration of S3 Express One buckets is similar to regular S3 buckets with one exception: we have to specify the endpoint according to the following pattern:

s3express-⟨availability zone⟩.⟨region⟩.amazonaws.com

where the ⟨availability zone⟩ (e.g., use-az5) can be obtained from the S3 Express One bucket's configuration page and the ⟨region⟩ is the AWS region (e.g., us-east-1).

For example, to allow DuckDB to use an S3 Express One bucket, configure the Secrets manager as follows:

CREATE SECRET (
    TYPE S3,
    REGION 'us-east-1',
    KEY_ID 'AKIAIOSFODNN7EXAMPLE',
    SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    ENDPOINT 's3express-use1-az5.us-east-1.amazonaws.com'
);

Instance Location

For best performance, make sure that the EC2 instance is in the same availability zone as the S3 Express One bucket you are querying. To determine the mapping between zone names and zone IDs, use the aws ec2 describe-availability-zones command.

  • Zone name to zone ID mapping:

    aws ec2 describe-availability-zones --output json \
        | jq -r '.AvailabilityZones[] | select(.ZoneName == "us-east-1f") | .ZoneId'
    
    use1-az5
    
  • Zone ID to zone name mapping:

    aws ec2 describe-availability-zones --output json \
        | jq -r '.AvailabilityZones[] | select(.ZoneId == "use1-az5") | .ZoneName'
    
    us-east-1f
    

Querying

You can query the S3 Express One bucket as any other S3 bucket:

SELECT *
FROM 's3://express-bucket-name--use1-az5--x-s3/my-file.parquet';

Performance

We ran two experiments on a c7gd.12xlarge instance using the LDBC SF300 Comments creationDate Parquet file file (also used in the microbenchmarks of the performance guide).

Experiment File size Runtime
Loading only from Parquet 4.1 GB 3.5s
Creating local table from Parquet 4.1 GB 5.1s

The “loading only” variant is running the load as part of an EXPLAIN ANALYZE statement to measure the runtime without account creating a local table, while the “creating local table” variant uses CREATE TABLE ... AS SELECT to create a persistent table on the local disk.