- Installation
- Guides
- Overview
- Data Import & Export
- Overview
- CSV Import
- CSV Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- JSON Import
- JSON Export
- Excel Import
- Excel Export
- MySQL Import
- PostgreSQL Import
- SQLite Import
- Directly Reading Files
- Performance
- Overview
- Schema
- Indexing
- Environment
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Features
- SQL Editors
- Data Viewers
- Documentation
- Overview
- Connect
- Data Import
- Overview
- CSV Files
- JSON Files
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Go
- Java
- Julia
- Node.js
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- ADBC
- ODBC
- Configuration
- SQL
- Introduction
- Statements
- Overview
- ALTER TABLE
- ALTER VIEW
- ATTACH/DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DROP
- EXPORT/IMPORT DATABASE
- INSERT
- PIVOT
- Profiling
- SELECT
- SET/RESET
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM & JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Statement
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- Functions
- Overview
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Utility Functions
- Aggregate Functions
- Constraints
- Indexes
- Information Schema
- Metadata Functions
- Keywords and Identifiers
- Samples
- Window Functions
- Extensions
- Development
- DuckDB Repositories
- Testing
- Overview
- Writing Tests
- sqllogictest
- Debugging
- Result Verification
- Persistent Testing
- Loops
- Multiple Connections
- Catch
- Internals Overview
- Storage Versions & Format
- Execution Format
- Profiling
- Release Calendar
- Building
- Benchmark Suite
- Sitemap
- Why DuckDB
- Media
- FAQ
- Code of Conduct
- Live Demo
Index Types
DuckDB currently uses two index types:
- A min-max index (also known as zonemap and block range index) is automatically created for columns of all general-purpose data types.
- An Adaptive Radix Tree (ART) is mainly used to ensure primary key constraints and to speed up point and very highly selective (i.e., < 0.1%) queries. Such an index is automatically created for columns with a
UNIQUE
orPRIMARY KEY
constraint and can be defined usingCREATE INDEX
.
Warning ART indexes must currently be able to fit in-memory. Avoid creating ART indexes if the index does not fit in memory.
Persistence
Both min-max indexes and ART indexes are persisted on disk.
CREATE INDEX
and DROP INDEX
To create an index, use the CREATE INDEX
statement.
To drop an index, use the DROP INDEX
statement.
Index Limitations
ART indexes create a secondary copy of the data in a second location - this complicates processing, particularly when combined with transactions. Certain limitations apply when it comes to modifying data that is also stored in secondary indexes.
As expected, indexes have a strong effect on performance, slowing down loading and updates, but speeding up certain queries. Please consult the Performance Guide for details.
Updates Become Deletes and Inserts
When an update statement is executed on a column that is present in an index - the statement is transformed into a delete of the original row followed by an insert. This has certain performance implications, particularly for wide tables, as entire rows are rewritten instead of only the affected columns.
Over-Eager Unique Constraint Checking
Due to the presence of transactions, data can only be removed from the index after (1) the transaction that performed the delete is committed, and (2) no further transactions exist that refer to the old entry still present in the index. As a result of this - transactions that perform deletions followed by insertions may trigger unexpected unique constraint violations, as the deleted tuple has not actually been removed from the index yet. For example:
CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO students VALUES (1, 'Student 1');
BEGIN;
DELETE FROM students WHERE id = 1;
INSERT INTO students VALUES (1, 'Student 2');
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint
This, combined with the fact that updates are turned into deletions and insertions within the same transaction, means that updating rows in the presence of unique or primary key constraints can often lead to unexpected unique constraint violations.
CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO students VALUES (1, 'Student 1');
UPDATE students SET name = 'Student 2', id = 1 WHERE id = 1;
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint
Currently, this is an expected limitation of the system - although we aim to resolve this in the future.
About this page
Last modified: 2024-03-18