Integrates DuckDB with Google BigQuery, allowing direct querying and management of BigQuery datasets
Maintainer(s):
hafenkran
Installing and Loading
INSTALL bigquery FROM community;
LOAD bigquery;
Example
-- Attach to your BigQuery Project
D ATTACH 'project=my_gcp_project' AS bq (TYPE bigquery, READ_ONLY);
-- Show all tables in all datasets in the attached BigQuery project
D SHOW ALL TABLES;
┌──────────┬──────────────────┬──────────┬──────────────┬───────────────────┬───────────┐
│ database │ schema │ name │ column_names │ column_types │ temporary │
│ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │
├──────────┼──────────────────┼──────────┼──────────────┼───────────────────┼───────────┤
│ bq │ quacking_dataset │ duck_tbl │ [i, s] │ [BIGINT, VARCHAR] │ false │
| bq | barking_dataset | dog_tbl | [i, s] | [BIGINT, VARCHAR] │ false |
└──────────┴──────────────────┴──────────┴──────────────┴───────────────────┴───────────┘
-- Select data from a specific table in BigQuery
D SELECT * FROM bq.quacking_dataset.duck_tbl;
┌───────┬────────────────┐
│ i │ s │
│ int32 │ varchar │
├───────┼────────────────┤
│ 12 │ quack 🦆 │
│ 13 │ quack quack 🦆 │
└───────┴────────────────┘
About bigquery
This extension allows DuckDB to connect to Google BigQuery using the BigQuery Storage (read/write) and REST APIs. It enables users to read, write, and manage their BigQuery datasets/tables directly from DuckDB using standard SQL queries. For detailed setup and usage instructions, visit the extension repository.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
bigquery_attach | table | Attach to a BigQuery project. | NULL | [ATTACH 'project=my_gcp_project' as bq (TYPE bigquery);] |
bigquery_scan | table | Scan a single table directly from BigQuery. | NULL | [SELECT * FROM bigquery_scan('my_gcp_project.quacking_dataset.duck_tbl');] |
bigquery_query | table | Run a custom GoogleSQL query in BigQuery and read the results. | NULL | [SELECT * FROM bigquery_query('bq', 'SELECT * FROM quacking_dataset.duck_tbl WHERE duck_id = 123');] |
bigquery_execute | table | Execute an arbitrary GoogleSQL query in BigQuery. | NULL | [CALL bigquery_execute('bq', 'CREATE SCHEMA deluxe_dataset OPTIONS(location="us", default_table_expiration_days=3.75);')] |
bigquery_jobs | table | List jobs in a BigQuery project. | NULL | [SELECT * FROM bigquery_jobs('bq');] |
bigquery_clear_cache | table | Clear the internal caches to refetch the most current project information from BigQuery. | NULL | [CALL bigquery_clear_cache();] |
bigquery_arrow_scan | table | NULL | NULL | NULL |
Added Settings
name | description | input_type | scope |
---|---|---|---|
bq_arrow_compression | Compression codec for BigQuery Storage Read API. Options: UNSPECIFIED, LZ4_FRAME, ZSTD.Default is LZ4_FRAME. | VARCHAR | GLOBAL |
bq_bignumeric_as_varchar | Read BigQuery BIGNUMERIC data type as VARCHAR | BOOLEAN | GLOBAL |
bq_curl_ca_bundle_path | Path to the CA bundle for curl | VARCHAR | GLOBAL |
bq_debug_show_queries | DEBUG SETTING: print all queries sent to BigQuery to stdout | BOOLEAN | GLOBAL |
bq_default_location | Default location for BigQuery queries | VARCHAR | GLOBAL |
bq_experimental_enable_bigquery_options | Whether to enable BigQuery OPTIONS in CREATE statements | BOOLEAN | GLOBAL |
bq_experimental_filter_pushdown | Whether to use filter pushdown (currently experimental) | BOOLEAN | GLOBAL |
bq_experimental_use_incubating_scan | Whether to use the incubating BigQuery scan implementation. This is currently experimental and is targeted to become the default in the future. DEPRECATED: Use bq_use_legacy_scan instead. This setting will be removed in a future version. | BOOLEAN | GLOBAL |
bq_experimental_use_info_schema | Whether to fetch table infos from BQ information schema (currently experimental). Can be significantly faster than fetching from REST API. | BOOLEAN | GLOBAL |
bq_max_read_streams | Maximum number of read streams for BigQuery Storage Read. Set to 0 to automatically match the number of DuckDB threads. preserve_insertion_order must be false for parallelization to work. |
BIGINT | GLOBAL |
bq_query_timeout_ms | Timeout for BigQuery queries in milliseconds | BIGINT | GLOBAL |
bq_use_legacy_scan | Whether to use legacy scan implementation for BigQuery tables. Default is false (uses optimized Arrow-based implementation). | BOOLEAN | GLOBAL |