- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java
- Julia
- Node.js (deprecated)
- Node.js (node-neo)
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- ADBC
- ODBC
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Statement
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Core Extensions
- Community Extensions
- Working with Extensions
- Versioning of Extensions
- Arrow
- AutoComplete
- AWS
- Azure
- Delta
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- Limits
- Non-Deterministic Behavior
- Embedding DuckDB
- DuckDB's Footprint
- Securing DuckDB
- Development
- DuckDB Repositories
- Testing
- Overview
- sqllogictest Introduction
- Writing Tests
- Debugging
- Result Verification
- Persistent Testing
- Loops
- Multiple Connections
- Catch
- Profiling
- Release Calendar
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unsupported Platforms
- Benchmark Suite
- Internals
- Sitemap
- Why DuckDB
- Media
- FAQ
- Code of Conduct
- Live Demo
DuckDB supports reading Excel .xlsx
files, however, .xls
files are not supported.
Importing Excel Sheets
Use the read_xlsx
function in the FROM
clause of a query:
SELECT * FROM read_xlsx('test_excel.xlsx');
Alternatively, you can omit the read_xlsx
function and let DuckDB infer it from the extension:
SELECT * FROM 'test_excel.xlsx';
However, if you want to be able to pass options to control the import behavior, you should use the read_xlsx
function.
One such option is the sheet
parameter, which allows specifying the name of the Excel worksheet:
SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
By default, the first sheet is loaded if no sheet is specified.
Importing a specific range
To select a specific range of cells, use the range
parameter with a string in the format A1:B2
, where A1
is the top-left cell and B2
is the bottom-right cell:
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A1:B2');
This can also be used to e.g. skip the first 5 of rows:
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A5:Z');
Or skip the first 5 columns
SELECT * FROM read_xlsx('test_excel.xlsx', range = 'E:Z');
If no range parameter is provided, the range is automatically inferred as the rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns.
By default, if no range is provided DuckDB will stop reading the excel file at when encountering an empty row. But when a range is provided, the default is to read until the end of the range. This behavior can be controlled with the stop_at_empty
parameter:
-- Read the first 100 rows, or until the first empty row, whichever comes first
SELECT * FROM read_xlsx('test_excel.xlsx', range = '1:100', stop_at_empty = true);
-- Always read the whole sheet, even if it contains empty rows
SELECT * FROM read_xlsx('test_excel.xlsx', stop_at_empty = false);
Creating a New Table
To create a new table using the result from a query, use CREATE TABLE ... AS
from a SELECT
statement:
CREATE TABLE new_tbl AS
SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
Loading to an Existing Table
To load data into an existing table from a query, use INSERT INTO
from a SELECT
statement:
INSERT INTO tbl
SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1');
Alternatively, you can use the COPY
statement with the XLSX
format option to import an Excel file into an existing table:
COPY tbl FROM 'test_excel.xlsx' (FORMAT XLSX, sheet 'Sheet1');
When using the COPY
statement to load an excel file into a existing table, the types of the columns in the target table will be used to coerce the types of the cells in the Excel sheet.
Importing a Sheet with/without a Header
To treat the first row as containing the names of the resulting columns, use the header
parameter:
SELECT * FROM read_xlsx('test_excel.xlsx', header = true);
By default, the first row is treated as a header if all the cells in the first row (within the inferred or supplied range) are non-empty strings. To disable this behavior, set header
to false
.
Detecting Types
When not importing into an existing table, DuckDB will attempt to infer the types of the columns in the Excel sheet based on their contents and/or "number format".
TIMESTAMP
,TIME
,DATE
andBOOLEAN
types are inferred when possible based on the "number format" applied to the cell.- Text cells containing
TRUE
andFALSE
are inferred asBOOLEAN
. - Empty cells are considered to be of type
DOUBLE
by default. - Otherwise cells are inferred as
VARCHAR
orDOUBLE
based on their contents.
This behavior can be adjusted in the following ways.
To treat all empty cells as VARCHAR
instead of DOUBLE
, set empty_as_varchar
to true
:
SELECT * FROM read_xlsx('test_excel.xlsx', empty_as_varchar = true);
To disable type inference completely and treat all cells as VARCHAR
, set all_varchar
to true
:
SELECT * FROM read_xlsx('test_excel.xlsx', all_varchar = true);
Additionally, if the ignore_errors
parameter is set to true
, DuckDB will silently replace cells that can't be cast to the corresponding inferred column type with NULL
's.
SELECT * FROM read_xlsx('test_excel.xlsx', ignore_errors = true);
See Also
DuckDB can also export Excel files. For additional details on Excel support, see the excel extension page.