The names of the fields matter and the two lists need to have the same size.
This page documents the rules for converting Python objects to DuckDB and DuckDB results to Python.
Object Conversion: Python Object to DuckDB
This is a mapping of Python object types to DuckDB Logical Types:
None
→NULL
bool
→BOOLEAN
datetime.timedelta
→INTERVAL
str
→VARCHAR
bytearray
→BLOB
memoryview
→BLOB
decimal.Decimal
→DECIMAL
/DOUBLE
uuid.UUID
→UUID
The rest of the conversion rules are as follows.
int
Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds:
BIGINT
INTEGER
UBIGINT
UINTEGER
DOUBLE
When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion.
float
These casts are tried in order until one succeeds:
DOUBLE
FLOAT
datetime.datetime
For datetime
we will check pandas.isnull
if it's available and return NULL
if it returns true
.
We check against datetime.datetime.min
and datetime.datetime.max
to convert to -inf
and +inf
respectively.
If the datetime
has tzinfo, we will use TIMESTAMPTZ
, otherwise it becomes TIMESTAMP
.
datetime.time
If the time
has tzinfo, we will use TIMETZ
, otherwise it becomes TIME
.
datetime.date
date
converts to the DATE
type.
We check against datetime.date.min
and datetime.date.max
to convert to -inf
and +inf
respectively.
bytes
bytes
converts to BLOB
by default, when it's used to construct a Value object of type BITSTRING
, it maps to BITSTRING
instead.
list
list
becomes a LIST
type of the “most permissive” type of its children, for example:
my_list_value = [
12345,
"test"
]
Will become VARCHAR[]
because 12345 can convert to VARCHAR
but test
can not convert to INTEGER
.
[12345, test]
dict
The dict
object can convert to either STRUCT(...)
or MAP(..., ...)
depending on its structure.
If the dict has a structure similar to:
my_map_dict = {
"key": [
1, 2, 3
],
"value": [
"one", "two", "three"
]
}
Then we'll convert it to a MAP
of key-value pairs of the two lists zipped together.
The example above becomes a MAP(INTEGER, VARCHAR)
:
{1=one, 2=two, 3=three}
Otherwise we'll try to convert it to a STRUCT
.
my_struct_dict = {
1: "one",
"2": 2,
"three": [1, 2, 3],
False: True
}
Becomes:
{'1': one, '2': 2, 'three': [1, 2, 3], 'False': true}
Every
key
of the dictionary is converted to string.
tuple
tuple
converts to LIST
by default, when it's used to construct a Value object of type STRUCT
it will convert to STRUCT
instead.
numpy.ndarray
and numpy.datetime64
ndarray
and datetime64
are converted by calling tolist()
and converting the result of that.
Result Conversion: DuckDB Results to Python
DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data.
NumPy
fetchnumpy()
fetches the data as a dictionary of NumPy arrays
Pandas
df()
fetches the data as a Pandas DataFramefetchdf()
is an alias ofdf()
fetch_df()
is an alias ofdf()
fetch_df_chunk(vector_multiple)
fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default).
Apache Arrow
arrow()
fetches the data as an Arrow tablefetch_arrow_table()
is an alias ofarrow()
fetch_record_batch(chunk_size)
returns an Arrow record batch reader withchunk_size
rows per batch
Polars
pl()
fetches the data as a Polars DataFrame
Examples
Below are some examples using this functionality. See the Python guides for more examples.
Fetch as Pandas DataFrame:
df = con.execute("SELECT * FROM items").fetchdf()
print(df)
item value count
0 jeans 20.0 1
1 hammer 42.2 2
2 laptop 2000.0 1
3 chainsaw 500.0 10
4 iphone 300.0 2
Fetch as dictionary of NumPy arrays:
arr = con.execute("SELECT * FROM items").fetchnumpy()
print(arr)
{'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'],
mask=[False, False, False, False, False],
fill_value='?',
dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0],
mask=[False, False, False, False, False],
fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2],
mask=[False, False, False, False, False],
fill_value=999999,
dtype=int32)}
Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing:
tbl = con.execute("SELECT * FROM items").fetch_arrow_table()
print(tbl.to_pandas())
item value count
0 jeans 20.00 1
1 hammer 42.20 2
2 laptop 2000.00 1
3 chainsaw 500.00 10
4 iphone 300.00 2