Search Shortcut cmd + k | ctrl + k
sheetreader

Fast XLSX file importer

Maintainer(s): freddie-freeloader

Installing and Loading

INSTALL sheetreader FROM community;
LOAD sheetreader;

Example

-- Create table from XLSX file & use default values for parameters
CREATE TABLE data AS FROM sheetreader('data.xlsx');

-- Example usage of available named parameters
CREATE TABLE data2 AS FROM sheetreader(
    'data2.xlsx',
    sheet_index = 1,
    threads = 16,
    skip_rows = 0,
    has_header = true,
    types = [BOOLEAN, VARCHAR],
    coerce_to_string = true,
    force_types = true
);

About sheetreader

About SheetReader

sheetreader is a DuckDB extension that allows reading XLSX files into DuckDB tables with SheetReader, our blazingly fast XLSX parser (https://github.com/polydbms/sheetreader-core).

Usage

Parameters

Name Description Type Default
sheet_index Index of the sheet to read. Starts at 1. INTEGER 1
sheet_name Name of the sheet to read.
Only either sheet_index or sheet_name can be set.
VARCHAR ""
threads Number of threads to use, while parsing INTEGER Half of available cores; minimum 1
skip_rows Number of rows to skip INTEGER 0
has_header Force to treat first row as header row.
<ul> <li> If successful, the cell contents are used for column names. </li> <li> If set to false (which is the default), the extension will still try to treat the first row as header row.
The difference is that it will not fail, if the first row is not usable. </li> </ul>
BOOLEAN false
types List of types for all columns <ul> <li> Types currently available:
VARCHAR,BOOLEAN,DOUBLE, DATE.</li> <li> Useful in combination with coerce_to_string and force_types. </li> </ul>
LIST(VARCHAR) Uses types determined by first & second row (after skipped rows)
coerce_to_string Coerce all cells in column of type VARCHAR to string (i.e. VARCHAR). BOOLEAN false
force_types Use types even if they are not compatible with types determined by first/second row.
Cells, that are not of the column type, are set to NULL or coerced to string, if option is set.
BOOLEAN false

More Information

SheetReader was published in the Information Systems Journal

@article{DBLP:journals/is/GavriilidisHZM23,
  author       = {Haralampos Gavriilidis and
                  Felix Henze and
                  Eleni Tzirita Zacharatou and
                  Volker Markl},
  title        = {SheetReader: Efficient Specialized Spreadsheet Parsing},
  journal      = {Inf. Syst.},
  volume       = {115},
  pages        = {102183},
  year         = {2023},
  url          = {https://doi.org/10.1016/j.is.2023.102183},
  doi          = {10.1016/J.IS.2023.102183},
  timestamp    = {Mon, 26 Jun 2023 20:54:32 +0200},
  biburl       = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Added Functions

function_name function_type description comment example
sheetreader table