- Installation
- Documentation
- Getting Started
- Connect
- Data Import and Export
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Lakehouse Formats
- Client APIs
- Overview
- Tertiary Clients
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Overview
- Arguments
- Dot Commands
- Output Formats
- Editing
- Safe Mode
- Autocomplete
- Syntax Highlighting
- Known Issues
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- PHP
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- MERGE INTO
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SHOW and SHOW DATABASES
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- DuckLake
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- Lance
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- Unity Catalog
- Vortex
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- AsOf Join
- Full-Text Search
- Graph Queries
- query and query_table Functions
- Merge Statement for SCD Type 2
- Timestamp Issues
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Installing DuckDB
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- DuckDB Docker Container
- Development
- DuckDB Repositories
- Release Cycle
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Sitemap
- Live Demo
The lance extension adds support for reading and writing Lance tables. Lance is a modern lakehouse format optimized for ML/AI workloads, with native cloud storage support.
Installing and Loading
You can install the lance extension from DuckDB's core extensions repository and load it using the following commands:
INSTALL lance;
LOAD lance;
Usage
Query a Lance Dataset
Local file:
SELECT *
FROM 'path/to/dataset.lance'
LIMIT 10;
S3:
SELECT *
FROM 's3://bucket/path/to/dataset.lance'
LIMIT 10;
To access object store URIs (e.g., s3://...), configure a TYPE lance secret using the Secrets Manager:
CREATE SECRET (
TYPE lance,
PROVIDER credential_chain,
SCOPE 's3://bucket/'
);
SELECT *
FROM 's3://bucket/path/to/dataset.lance'
LIMIT 10;
Write a Lance Dataset
Use the COPY ... TO ... statement to materialize query results as a Lance dataset.
-- Create/overwrite a Lance dataset from a query
COPY (
SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
UNION ALL
SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s
) TO 'path/to/out.lance' (
FORMAT lance,
MODE 'overwrite'
);
-- Read it back via the replacement scan
SELECT count(*) FROM 'path/to/out.lance';
-- Append more rows to an existing dataset
COPY (
SELECT 3::BIGINT AS id, 'c'::VARCHAR AS s
) TO 'path/to/out.lance' (
FORMAT lance,
MODE 'append'
);
-- Optionally create an empty dataset (schema only)
COPY (
SELECT 1::BIGINT AS id, 'x'::VARCHAR AS s
WITH NO DATA
) TO 'path/to/empty.lance' (
FORMAT lance,
MODE 'overwrite',
WRITE_EMPTY_FILE true
);
To write to s3://... paths, configure a TYPE lance secret for that scope using the Secrets Manager:
CREATE SECRET (
TYPE lance,
PROVIDER credential_chain,
SCOPE 's3://bucket/'
);
COPY (SELECT 1 AS id)
TO 's3://bucket/path/to/out.lance'
(FORMAT lance, MODE 'overwrite');
Create a Lance Dataset via CREATE TABLE (Directory Namespace)
When you ATTACH a directory as a Lance namespace, you can create new datasets using CREATE TABLE (schema-only)
or CREATE TABLE AS SELECT (CTAS). The dataset is written to namespace_root/table_name.lance.
ATTACH 'path/to/dir' AS lance_ns (TYPE lance);
-- Schema-only (creates an empty dataset)
CREATE TABLE lance_ns.main.my_empty (id BIGINT, s VARCHAR);
-- CTAS (writes query results)
CREATE TABLE lance_ns.main.my_dataset AS
SELECT 1::BIGINT AS id, 'a'::VARCHAR AS s
UNION ALL
SELECT 2::BIGINT AS id, 'b'::VARCHAR AS s;
SELECT count(*) FROM lance_ns.main.my_dataset;
Vector Search
-- Search a vector column, returning distances in `_distance` (smaller is closer)
SELECT id, label, _distance
FROM lance_vector_search(
'path/to/dataset.lance', 'vec',
[0.1, 0.2, 0.3, 0.4]::FLOAT[4],
k = 5,
prefilter = true
)
ORDER BY _distance ASC;
See the SQL reference for full parameter documentation.
Full-Text Search (FTS)
-- Search a text column, returning BM25-like scores in `_score`
SELECT id, text, _score
FROM lance_fts(
'path/to/dataset.lance',
'text',
'puppy',
k = 10,
prefilter = true
)
ORDER BY _score DESC;
See the SQL reference for full parameter documentation.
Hybrid Search (Vector + FTS)
-- Combine vector and text scores, returning `_hybrid_score` in addition to `_distance` / `_score`
SELECT id, _hybrid_score, _distance, _score
FROM lance_hybrid_search('path/to/dataset.lance',
'vec', [0.1, 0.2, 0.3, 0.4]::FLOAT[4],
'text', 'puppy',
k = 10, prefilter = false,
alpha = 0.5, oversample_factor = 4)
ORDER BY _hybrid_score DESC;
Limitations
The lance extension is currently available for the following platforms:
linux_amd64linux_arm64osx_arm64windows_amd64