Allows the consumption and production of the Apache Arrow interprocess communication (IPC) format, both from files and directly from stream buffers.
Maintainer(s):
paleolimbot,
pdet
Installing and Loading
INSTALL nanoarrow FROM community;
LOAD nanoarrow;
Example
-- Read from a file in Arrow IPC format
FROM 'arrow_file.arrow';
FROM 'arrow_file.arrows';
FROM read_arrow('arrow_file.arrow');
-- Write a file in Arrow IPC stream format
CREATE TABLE arrow_libraries AS SELECT 'nanoarrow' as name, '0.6' as version;
COPY arrow_libraries TO 'test.arrows' (FORMAT ARROWS, BATCH_SIZE 100);
-- Write to buffers: This returns IPC message BLOBs and indicates which one is the header.
FROM to_arrow_ipc((FROM arrow_libraries));
About nanoarrow
The Arrow IPC library allows users to read and write data in the Arrow IPC stream format.
This can be done by either reading and producing .arrow
files or by directly reading buffers using their pointers and sizes.
It is important to note that reading buffers is dangerous, as an incorrect pointer can crash the database system.
This process is temporary and will be deprecated in the future, as clients (e.g., the Python DuckDB client) will have a function that internally extracts these buffers from an Arrow stream.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
nanoarrow_version | scalar | NULL | NULL | [] |
read_arrow | table | NULL | NULL | [] |
scan_arrow_ipc | table | NULL | NULL | [] |
to_arrow_ipc | table | NULL | NULL | [] |