Practical Spreadsheet Parsing with SheetReader

Haralampos Gavriilidis, Felix Henze, Joel Ziegler, Jonas Benn, Eleni Tzirita Zacharatou, Volker Markl

2026-03-24


Paper	Practical Spreadsheet Parsing with SheetReader (PDF)
Venue	EDBT 2026

Abstract

Spreadsheets remain a ubiquitous tool for data management and analysis. Since systems like Excel offer limited analytical capabilities, users routinely load spreadsheets into richer ecosystems such as Python, R, and DBMSes. However, existing spreadsheet loaders rely on general-purpose XML parsers that are ill-suited for the XLSX format, resulting in severe CPU and memory bottlenecks. In prior work, we introduced SheetReader, a specialized spreadsheet parser that leverages the structure of XLSX files and employs parallelism to significantly reduce ingestion costs, achieving up to an order of magnitude speedup and multi-gigabyte memory savings compared to state-of-the-art methods. This demonstration provides an interactive workbench where visitors can visualize XLSX internals, benchmark SheetReader against baseline parsers with live resource monitoring, and explore integrations for Python, R, PostgreSQL, and DuckDB, including running SQL directly over spreadsheets.

Implementation

SheetReader is available as a DuckDB community extension.

Other Library Resources

Iceberg Summit 2026

Talk

Building DuckDB-Iceberg: Exploring the Iceberg Ecosystem

2026-04-08

Tom Ebergen (DuckDB Labs)

arXiv

Paper

The Science Data Lake: A Unified Open Infrastructure Integrating 293 Million Papers Across Eight Scholarly Sources with Embedding-Based Ontology Alignment

2026-03-03

Jonas Wilinski

arXiv

Paper

Should I Hide My Duck in the Lake?

2026-02-21

Jonas Dann, Gustavo Alonso

All library resources