nanoarrow

Search Shortcut cmd + k | ctrl + k

Documentation

nanoarrow

Downloads 952this week

GitHub stars 40

Extension repository on GitHub

Extension descriptor (YAML)

Allows the consumption and production of the Apache Arrow interprocess communication (IPC) format, both from files and directly from stream buffers.

Maintainer(s): paleolimbot, pdet

Installing and Loading

INSTALL nanoarrow FROM community;
LOAD nanoarrow;

Example

-- Read from a file in Arrow IPC format
FROM 'arrow_file.arrow';
FROM 'arrow_file.arrows';
FROM read_arrow('arrow_file.arrow');

-- Write a file in Arrow IPC stream format
CREATE TABLE arrow_libraries AS SELECT 'nanoarrow' as name, '0.6' as version;
COPY arrow_libraries TO 'test.arrows' (FORMAT ARROWS, BATCH_SIZE 100);

-- Write to buffers: This returns IPC message BLOBs and indicates which one is the header.
FROM to_arrow_ipc((FROM arrow_libraries));

The Arrow IPC library allows users to read and write data in the Arrow IPC stream format. This can be done by either reading and producing .arrow files or by directly reading buffers using their pointers and sizes. It is important to note that reading buffers is dangerous, as an incorrect pointer can crash the database system. This process is temporary and will be deprecated in the future, as clients (e.g., the Python DuckDB client) will have a function that internally extracts these buffers from an Arrow stream.

Added Functions

function_name	function_type	description	comment
nanoarrow_version	scalar	NULL	NULL
read_arrow	table	NULL	NULL
scan_arrow_ipc	table	NULL	NULL
to_arrow_ipc	table	NULL	NULL

Installing and Loading

Example

About nanoarrow

Added Functions

In this article