fire_duck_ext

Search Shortcut cmd + k | ctrl + k

Documentation

fire_duck_ext

Downloads 225this week

GitHub stars 2

Extension repository on GitHub

Extension descriptor (YAML)

Query Google Cloud Firestore directly from DuckDB using SQL

Maintainer(s): BorisBesky

Installing and Loading

INSTALL fire_duck_ext FROM community;
LOAD fire_duck_ext;

Example

LOAD fire_duck_ext;

-- Authenticate with a service account JSON file
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/service-account.json'
);

-- Or use inline JSON for the service account
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '{
        "type": "service_account",
        "project_id": "my-gcp-project",
        "private_key_id": "key123abc",
        "private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",
        "client_email": "[email protected]",
        "client_id": "123456789",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token"
    }'
);

-- Or authenticate with an API key (for dev/testing)
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourApiKeyHere'
);

-- Read documents
SELECT * FROM firestore_scan('users');

-- Filter documents
SELECT __document_id, name, email
FROM firestore_scan('users')
WHERE status = 'active';

-- Update documents
CALL firestore_update('users', 'user123', 'status', 'verified');

-- Batch update with DuckDB filtering
SET VARIABLE ids = (
    SELECT list(__document_id)
    FROM firestore_scan('users')
    WHERE status = 'pending'
);
CALL firestore_update_batch('users', getvariable('ids'), 'status', 'reviewed', 'updated_at', now());

-- Insert documents from a subquery (auto-generated IDs)
CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));

-- Insert with explicit document IDs from a column
CALL firestore_insert('users',
    (SELECT * FROM read_csv('new_users.csv')),
    document_id := 'user_id');

-- Insert with explicit document ID selected from a column
CALL firestore_insert('users',
    (SELECT 'Alice' AS name, 30 AS age, 'alice_user_id' AS user_id),
    document_id := 'user_id');

-- Target a specific named database
CREATE SECRET my_named_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASE 'my-database'
);

-- Target multiple databases with a single secret
CREATE SECRET my_multi_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASES ['(default)', 'analytics-db']
);

-- Target multiple databases with an API key
CREATE SECRET my_multi_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourApiKeyHere',
    DATABASES ['(default)', 'analytics-db']
);

-- Connect to a specific database for the session
CALL firestore_connect('analytics-db');

-- Collection group queries
SELECT __document_id, name, email
FROM firestore_scan('~data_group')
WHERE status = 'active';

The fire_duck_ext extension enables direct SQL access to Google Cloud Firestore collections from DuckDB. It supports reading data with firestore_scan(), writing with firestore_update() and firestore_delete(), batch operations, array transforms, collection group queries, SQL ORDER BY / LIMIT pushdown, collection ID listings, and DuckDB secret management for credential storage. Filter pushdown optimizes queries by sending supported filters directly to Firestore.

Authentication

Secrets require PROJECT_ID and one of SERVICE_ACCOUNT_JSON or API_KEY.

SERVICE_ACCOUNT_JSON accepts either a file path or the full inline JSON text of a Google Cloud service account key. The JSON must contain at least type, project_id, private_key, and client_email.

API_KEY provides unauthenticated access suitable for development, testing, or public databases. API key auth does not support batchWrite, so batch operations fall back to individual requests.

If the GOOGLE_APPLICATION_CREDENTIALS environment variable is set, the extension automatically creates an internal secret on startup that matches all databases (DATABASE '*'), so no CREATE SECRET is needed.

Type Mapping

string → VARCHAR
integer → BIGINT
double → DOUBLE
boolean → BOOLEAN
timestamp → TIMESTAMP
array → LIST
map → VARCHAR (JSON string)
vector → ARRAY(DOUBLE, N) (embedding)
null → NULL
geoPoint → STRUCT(latitude DOUBLE, longitude DOUBLE)
reference → VARCHAR
bytes → BLOB

Schema Inference

When you call firestore_scan(), the extension automatically infers the schema by sampling up to 100 documents from the collection. For each field, it determines the DuckDB type using a voting system: the most common non-null Firestore type across the sampled documents wins. Fields that do not appear in every sampled document are marked nullable.

For array fields, element types are also inferred by sampling elements across documents and selecting the most common element type. If a field is only ever null across all samples, it defaults to VARCHAR.

A virtual __document_id column (VARCHAR) is always added as the first column, containing the Firestore document ID (or the full document path for collection group queries).

Documents with heterogeneous field types are handled gracefully at read time: values that cannot be converted to the inferred column type are returned as NULL.

Null Semantics

Both missing fields and explicit Firestore null values appear as NULL in DuckDB. There is no way to distinguish between the two on read.

Writing NULL to a field sets it to an explicit Firestore null value; it does not delete the field from the document.

WHERE field IS NULL is not pushed down to Firestore because Firestore's IS_NULL operator only matches fields that exist and are explicitly null, while DuckDB treats missing fields as NULL. WHERE field IS NOT NULL is pushed down safely.

Batch Operations

Batch updates and deletes group writes into requests of up to 500 operations each using Firestore's batchWrite API. Note that batchWrite is not atomic: individual writes within the batch may succeed or fail independently. If batchWrite is unavailable (e.g., API key auth), the extension falls back to individual operations automatically.

Database Targeting

By default, secrets target the (default) Firestore database. Use DATABASE 'name' for a single named database, DATABASE '*' as a wildcard that matches any database, or DATABASES ['db1', 'db2'] to allow a single secret to match multiple named databases. You cannot specify both DATABASE and DATABASES in the same secret.

Use CALL firestore_connect('db-name') to set the session database. All subsequent queries use this database until CALL firestore_disconnect() is called. A per-query database := parameter always takes priority over the connected session database.

Collection Group Queries

Use the ~collection prefix to query across all subcollections with a given name. For example, firestore_scan('~orders') reads all documents from every subcollection named "orders" regardless of parent path.

SQL ORDER BY / LIMIT Pushdown

firestore_scan() can automatically push simple SQL ORDER BY, LIMIT, and OFFSET clauses down to Firestore, reducing the number of documents fetched for sorted and top-N queries.

Supported patterns include:

ORDER BY field
ORDER BY field DESC
ORDER BY field1, field2
LIMIT n
ORDER BY ... LIMIT n
ORDER BY ... LIMIT n OFFSET m (pushed as LIMIT n + m, with DuckDB applying the final offset)

Named parameters still work and take precedence over SQL pushdown:

If order_by= is provided, that server-side ordering is used and DuckDB applies any SQL ORDER BY afterward.
If scan_limit= is provided, that fetch limit is used and DuckDB applies any SQL LIMIT afterward.

Use EXPLAIN to verify when SQL ordering and limits are being pushed. Firestore will show entries like Firestore Pushed Order: score DESC and Firestore Pushed Limit: 5.

Collection ID Listings

When firestore_scan() is given a document path instead of a collection path, it lists that document's direct subcollection IDs instead of reading documents. This is useful for discovering unknown nested collection names.

The result contains a single __document_id column, where each row is a subcollection ID.

Document-path scans support:

Pagination across large numbers of subcollections
SQL ORDER BY __document_id pushdown
SQL LIMIT pushdown
Named order_by='__document_id' or order_by='__document_id DESC', plus scan_limit=...

Other ordering expressions still work, but they are evaluated in DuckDB after fetching the subcollection IDs.

Missing Documents

By default, firestore_scan() includes phantom/missing documents — documents that have no fields but exist as parent paths for subcollections. This matches Firebase Console behavior and is controlled by show_missing (default: true). Set show_missing := false to only return documents with fields. When a collection contains only phantom documents, the result includes just the __document_id column.

Added Functions

function_name	function_type	description	comment	examples
firestore_scan	table	Read all documents from a Firestore collection or collection group.	NULL	[SELECT * FROM firestore_scan('users');]
firestore_insert	table	Insert documents into a Firestore collection from a subquery.	NULL	[CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));]
firestore_update	table	Update fields on a single Firestore document.	NULL	[CALL firestore_update('users', 'user123', 'status', 'verified');]
firestore_delete	table	Delete a single Firestore document.	NULL	[CALL firestore_delete('users', 'user123');]
firestore_update_batch	table	Batch update multiple Firestore documents by ID list.	NULL	[CALL firestore_update_batch('users', ['id1', 'id2'], 'status', 'reviewed');]
firestore_delete_batch	table	Batch delete multiple Firestore documents by ID list.	NULL	[CALL firestore_delete_batch('users', ['id1', 'id2']);]
firestore_array_union	table	Add elements to an array field without duplicates.	NULL	[CALL firestore_array_union('users', 'user123', 'tags', ['vip', 'active']);]
firestore_array_remove	table	Remove elements from an array field.	NULL	[CALL firestore_array_remove('users', 'user123', 'tags', ['inactive']);]
firestore_array_append	table	Append elements to an array field (allows duplicates).	NULL	[CALL firestore_array_append('users', 'user123', 'log', ['event1']);]
firestore_clear_cache	table	Clear the cached schema for all or a specific collection.	NULL	[CALL firestore_clear_cache();]
firestore_connect	table	Set the session-scoped Firestore database for subsequent queries.	NULL	[CALL firestore_connect('analytics-db');]
firestore_disconnect	table	Clear the session-scoped Firestore database override.	NULL	[CALL firestore_disconnect();]

name	description	input_type	scope	aliases
firestore_schema_cache_ttl	Schema cache TTL in seconds (0 to disable caching)	BIGINT	GLOBAL	[]