Search Shortcut cmd + k | ctrl + k
Search cmd+k ctrl+k
1.0 (stable)
Python DB API

The standard DuckDB Python API provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249 similar to the SQLite Python API.

Connection

To use the module, you must first create a DuckDBPyConnection object that represents the database.

The connection object takes as a parameter the database file to read and write from.

File-Based Connection

If the database file does not exist, it will be created (the file extension may be .db, .duckdb, or anything else).

In-Memory Connection

The special value :memory: (the default) can be used to create an in-memory database. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process). If you would like to connect to an existing database in read-only mode, you can set the read_only flag to True. Read-only mode is required if multiple Python processes want to access the same database file at the same time.

By default we create an in-memory-database that lives inside the duckdb module.

Default Connection

Every method of DuckDBPyConnection is also available on the duckdb module, this connection is what's used by these methods. You can also get a reference to this connection by providing the special value :default: to connect or by using duckdb.default_connection.

import duckdb

duckdb.execute("CREATE TABLE tbl AS SELECT 42 a")
con = duckdb.connect(":default:")
con.sql("SELECT * FROM tbl")
# or
duckdb.default_connection.sql("SELECT * FROM tbl")
┌───────┐
│   a   │
│ int32 │
├───────┤
│    42 │
└───────┘
import duckdb
# to start an in-memory database
con = duckdb.connect(database = ":memory:")
# to use a database file (not shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = False)
# to use a database file (shared between processes)
con = duckdb.connect(database = "my-db.duckdb", read_only = True)
# to explicitly get the default connection
con = duckdb.connect(database = ":default:")

If you want to create a second connection to an existing database, you can use the cursor() method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case.

Connections are closed implicitly when they go out of scope or if they are explicitly closed using close(). Once the last connection to a database instance is closed, the database instance is closed as well.

Querying

SQL queries can be sent to DuckDB using the execute() method of connections. Once a query has been executed, results can be retrieved using the fetchone and fetchall methods on the connection. fetchall will retrieve all results and complete the transaction. fetchone will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once fetchone is called and there are no more results remaining (the return value will be None). As an example, in the case of a query only returning a single row, fetchone should be called once to retrieve the results and a second time to close the transaction. Below are some short examples:

# create a table
con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)]

# retrieve the items one at a time
con.execute("SELECT * FROM items")
print(con.fetchone())
# ('jeans', Decimal('20.00'), 1)
print(con.fetchone())
# ('hammer', Decimal('42.20'), 2)
print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None
# None

The description property of the connection object contains the column names as per the standard.

Prepared Statements

DuckDB also supports prepared statements in the API with the execute and executemany methods. The values may be passed as an additional parameter after a query that contains ? or $1 (dollar symbol and a number) placeholders. Using the ? notation adds the values in the same sequence as passed within the Python parameter. Using the $ notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the conversion rules.

Here are some examples. First, insert a row using a prepared statement:

con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1])

Second, insert several rows using a prepared statement:

con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] )

Query the database using a prepared statement:

con.execute("SELECT item FROM items WHERE value > ?", [400])
print(con.fetchall())
[('laptop',), ('chainsaw',)]

Query using the $ notation for a prepared statement and reused values:

con.execute("SELECT $1, $1, $2", ["duck", "goose"])
print(con.fetchall())
[('duck', 'duck', 'goose')]

Warning Do not use executemany to insert large amounts of data into DuckDB. See the data ingestion page for better options.

Named Parameters

Besides the standard unnamed parameters, like $1, $2 etc., it's also possible to supply named parameters, like $my_parameter. When using named parameters, you have to provide a dictionary mapping of str to value in the parameters argument. An example use is the following:

import duckdb

res = duckdb.execute("""
    SELECT
        $my_param,
        $other_param,
        $also_param
    """,
    {
        "my_param": 5,
        "other_param": "DuckDB",
        "also_param": [42]
    }
).fetchall()
print(res)
[(5, 'DuckDB', [42])]
About this page

Last modified: 2024-06-18