Search Shortcut cmd + k | ctrl + k
nanoarrow

Allows the consumption and production of the Apache Arrow interprocess communication (IPC) format, both from files and directly from stream buffers.

Maintainer(s): paleolimbot, pdet

Installing and Loading

INSTALL nanoarrow FROM community;
LOAD nanoarrow;

Example

-- Read from a file in Arrow IPC format
FROM 'arrow_file.arrow';
FROM 'arrow_file.arrows';
FROM read_arrow('arrow_file.arrow');

-- Write a file in Arrow IPC stream format
CREATE TABLE arrow_libraries AS SELECT 'nanoarrow' as name, '0.6' as version;
COPY arrow_libraries TO 'test.arrows' (FORMAT ARROWS, BATCH_SIZE 100);

-- Write to buffers: This returns IPC message BLOBs and indicates which one is the header.
FROM to_arrow_ipc((FROM arrow_libraries));

About nanoarrow

The Arrow IPC library allows users to read and write data in the Arrow IPC stream format. This can be done by either reading and producing .arrow files or by directly reading buffers using their pointers and sizes. It is important to note that reading buffers is dangerous, as an incorrect pointer can crash the database system. This process is temporary and will be deprecated in the future, as clients (e.g., the Python DuckDB client) will have a function that internally extracts these buffers from an Arrow stream.

Added Functions

function_name function_type description comment examples
nanoarrow_version scalar NULL NULL []
read_arrow table NULL NULL []
scan_arrow_ipc table NULL NULL []
to_arrow_ipc table NULL NULL []