Search Shortcut cmd + k | ctrl + k
bigquery

Integrates DuckDB with Google BigQuery, allowing direct querying and management of BigQuery datasets

Maintainer(s): hafenkran

Installing and Loading

INSTALL bigquery FROM community;
LOAD bigquery;

Example

-- Attach to your BigQuery Project
D ATTACH 'project=my_gcp_project' AS bq (TYPE bigquery, READ_ONLY);

-- Show all tables in all datasets in the attached BigQuery project
D SHOW ALL TABLES;
┌──────────┬──────────────────┬──────────┬──────────────┬───────────────────┬───────────┐
 database       schema         name    column_names    column_types     temporary 
 varchar       varchar        varchar   varchar[]        varchar[]       boolean  
├──────────┼──────────────────┼──────────┼──────────────┼───────────────────┼───────────┤
 bq        quacking_dataset  duck_tbl  [i, s]        [BIGINT, VARCHAR]  false     
| bq       | barking_dataset  | dog_tbl  | [i, s]       | [BIGINT, VARCHAR]  false     |
└──────────┴──────────────────┴──────────┴──────────────┴───────────────────┴───────────┘

-- Select data from a specific table in BigQuery
D SELECT * FROM bq.quacking_dataset.duck_tbl;
┌───────┬────────────────┐
   i          s        
 int32     varchar     
├───────┼────────────────┤
    12  quack 🦆       
    13  quack quack 🦆 
└───────┴────────────────┘

About bigquery

This extension allows DuckDB to connect to Google BigQuery using the BigQuery Storage (read/write) and REST APIs. It enables users to read, write, and manage their BigQuery datasets/tables directly from DuckDB using standard SQL queries. For detailed setup and usage instructions, visit the extension repository.

Added Functions

function_name function_type description comment examples
bigquery_attach table Attach to a BigQuery project. NULL [ATTACH 'project=my_gcp_project' as bq (TYPE bigquery);]
bigquery_scan table Scan a single table directly from BigQuery. NULL [SELECT * FROM bigquery_scan('my_gcp_project.quacking_dataset.duck_tbl');]
bigquery_query table Run a custom GoogleSQL query in BigQuery and read the results. NULL [SELECT * FROM bigquery_query('bq', 'SELECT * FROM quacking_dataset.duck_tbl WHERE duck_id = 123');]
bigquery_execute table Execute an arbitrary GoogleSQL query in BigQuery. NULL [CALL bigquery_execute('bq', 'CREATE SCHEMA deluxe_dataset OPTIONS(location="us", default_table_expiration_days=3.75);')]
bigquery_jobs table List jobs in a BigQuery project. NULL [SELECT * FROM bigquery_jobs('bq');]
bigquery_clear_cache table Clear the internal caches to refetch the most current project information from BigQuery. NULL [CALL bigquery_clear_cache();]
bigquery_arrow_scan table NULL NULL NULL

Added Settings

name description input_type scope
bq_arrow_compression Compression codec for BigQuery Storage Read API. Options: UNSPECIFIED, LZ4_FRAME, ZSTD.Default is LZ4_FRAME. VARCHAR GLOBAL
bq_bignumeric_as_varchar Read BigQuery BIGNUMERIC data type as VARCHAR BOOLEAN GLOBAL
bq_curl_ca_bundle_path Path to the CA bundle for curl VARCHAR GLOBAL
bq_debug_show_queries DEBUG SETTING: print all queries sent to BigQuery to stdout BOOLEAN GLOBAL
bq_default_location Default location for BigQuery queries VARCHAR GLOBAL
bq_experimental_enable_bigquery_options Whether to enable BigQuery OPTIONS in CREATE statements BOOLEAN GLOBAL
bq_experimental_filter_pushdown Whether to use filter pushdown (currently experimental) BOOLEAN GLOBAL
bq_experimental_use_incubating_scan Whether to use the incubating BigQuery scan implementation. This is currently experimental and is targeted to become the default in the future. DEPRECATED: Use bq_use_legacy_scan instead. This setting will be removed in a future version. BOOLEAN GLOBAL
bq_experimental_use_info_schema Whether to fetch table infos from BQ information schema (currently experimental). Can be significantly faster than fetching from REST API. BOOLEAN GLOBAL
bq_max_read_streams Maximum number of read streams for BigQuery Storage Read. Set to 0 to automatically match the number of DuckDB threads. preserve_insertion_order must be false for parallelization to work. BIGINT GLOBAL
bq_query_timeout_ms Timeout for BigQuery queries in milliseconds BIGINT GLOBAL
bq_use_legacy_scan Whether to use legacy scan implementation for BigQuery tables. Default is false (uses optimized Arrow-based implementation). BOOLEAN GLOBAL