Filtered HNSW vector search using the ACORN-1 algorithm for correct results with WHERE clauses
Installing and Loading
INSTALL hnsw_acorn FROM community;
LOAD hnsw_acorn;
Example
-- Create a table with vectors and categories
CREATE TABLE docs (id INTEGER, vec FLOAT[3], category VARCHAR);
INSERT INTO docs VALUES
(1, [1.0, 0.0, 0.0], 'science'),
(2, [0.9, 0.1, 0.0], 'science'),
(3, [0.0, 1.0, 0.0], 'history'),
(4, [0.0, 0.0, 1.0], 'history'),
(5, [0.8, 0.2, 0.1], 'science'),
(6, [0.1, 0.9, 0.0], 'history');
-- Create HNSW index
CREATE INDEX idx ON docs USING HNSW (vec);
-- Filtered vector search: only 'science' docs, nearest to query
-- Returns exactly 2 results (upstream duckdb-vss may return fewer)
SELECT id, category, array_distance(vec, [1.0, 0.0, 0.0]::FLOAT[3]) AS dist
FROM docs
WHERE category = 'science'
ORDER BY dist
LIMIT 2;
About hnsw_acorn
Fork of duckdb-vss that adds ACORN-1 filtered HNSW search (arXiv:2403.04871). The upstream extension applies WHERE clauses after the index returns top-k results, so filtered queries often return fewer rows than LIMIT. This extension pushes filter predicates into the HNSW graph traversal, ensuring correct result counts with high recall.
Features:
- ACORN-1 two-hop expansion for graph connectivity under filtering
- Three-hop expansion for very low selectivity
- Selectivity-based strategy switching (post-filter / ACORN-1 / brute-force)
- Zone map pruning for efficient filter evaluation
- Per-node 90% expansion threshold (Lucene's optimization)
- All three distance metrics: L2, cosine, inner product
- Prepared statement support for parameterized query vectors
- Configurable via SET hnsw_acorn_threshold / hnsw_bruteforce_threshold
Benchmark (228k movies, 768-dim embeddings, LIMIT 10): English (~60%): 10/10 | Japanese (~3%): 10/10 Korean (~1%): 10/10 | Rating >= 8.0 (~5%): 10/10
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| hnsw_compact_index | pragma | NULL | NULL | |
| hnsw_index_scan | table | NULL | NULL | |
| pragma_hnsw_index_info | table | NULL | NULL | |
| vss_join | table_macro | NULL | NULL | |
| vss_match | table_macro | NULL | NULL |
Overloaded Functions
| function_name | function_type | description | comment | examples | |—————|—————|————-|———|———-|
Added Types
| type_name | type_size | logical_type | type_category | internal | |———–|———-:|————–|—————|———-|
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| hnsw_acorn_threshold | selectivity above which ACORN-1 is skipped (standard HNSW + post-filter used instead) | FLOAT | GLOBAL | [] |
| hnsw_bruteforce_threshold | selectivity below which brute-force exact scan is used instead of ACORN-1 | FLOAT | GLOBAL | [] |
| hnsw_ef_search | experimental: override the ef_search parameter when scanning HNSW indexes | BIGINT | GLOBAL | [] |
| hnsw_enable_experimental_persistence | experimental: enable creating HNSW indexes in persistent databases | BOOLEAN | GLOBAL | [] |