Export To Apache Arrow
Version current
Version:

How to export data to Apache Arrow

All results of a query can be exported to an Apache Arrow Table using the arrow function. Alternatively, results can be returned as a RecordBatchReader using the fetch_record_batch function and results can be read one batch at a time. In addition, relations built using DuckDB’s Relational API can also be exported.

Export to an Arrow Table

import duckdb
import pyarrow as pa

# connect to an in-memory database
con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i':[1,2,3,4],
                                       'j':["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = con.execute("SELECT * FROM my_arrow_table").arrow()

Export as a RecordBatchReader

import duckdb
import pyarrow as pa

# connect to an in-memory database
con = duckdb.connect()

my_arrow_table = pa.Table.from_pydict({'i':[1,2,3,4],
                                       'j':["one", "two", "three", "four"]})
                                       
# query the Apache Arrow Table "my_arrow_table" and return as an Arrow RecordBatchReader
chunk_size = 1_000_000
results = con.execute("SELECT * FROM my_arrow_table").fetch_record_batch(chunk_size)

# Loop through the results. A StopIteration exception is thrown when the RecordBatchReader is empty
while True:
    try:
        # Process a single chunk here (just printing as an example)
        print(results.read_next_batch().to_pandas())
    except StopIteration:
        print('Already fetched all batches')
        break

Export from Relational API

Arrow objects can also be exported from the Relational API. A relation can be converted to an Arrow table using the arrow or to_arrow_table functions, or a record batch using record_batch.
A result can be exported to an Arrow table with arrow or the alias fetch_arrow_table, or to a RecordBatchReader using fetch_arrow_reader.

import duckdb

# connect to an in-memory database
con = duckdb.connect()

con.execute('CREATE TABLE integers (i integer)')
con.execute('INSERT INTO integers VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(NULL)')

# Create a relation from the table and export the entire relation as Arrow
rel = con.table("integers")
relation_as_arrow = rel.arrow() # or .to_arrow_table()

# Or, calculate a result using that relation and export that result to Arrow
res = rel.aggregate("sum(i)").execute()
result_as_arrow = res.arrow() # or fetch_arrow_table()