Extension for locality-sensitive hashing (LSH)
Maintainer(s):
yoonspark,
ericmanning
Installing and Loading
INSTALL lsh FROM community;
LOAD lsh;
Example
-- Install and load the extension
INSTALL lsh FROM community;
LOAD lsh;
-- Create toy data
CREATE TEMPORARY TABLE temp_names AS
SELECT * FROM (
VALUES
('Alice Johnson'),
('Robert Smith'),
(NULL),
('Charlotte Brown'),
('David Martinez'),
('Emily Davis'),
('Michael Wilson'),
('Sophia Taylor'),
(NULL),
('James Anderson'),
('Olivia Thomas'),
('Benjamin Lee')
) AS t(name);
-- Apply MinHash
SELECT lsh_min(name, 2, 3, 2, 123) AS hash FROM temp_names;
About lsh
For more information regarding usage, see the documentation.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| lsh_min | scalar | Computes a band hash vector for each input string based on its MinHash signature | lsh_min(string, ngram_width, band_count, band_size, seed) | [SELECT lsh_min('Princeton University', 2, 3, 2, 123);] |
| lsh_min32 | scalar | Computes a band hash vector for each input string based on its MinHash signature | Reduces each band hash to 32 bits | [SELECT lsh_min32('Princeton University', 2, 3, 2, 123);] |
| lsh_euclidean | scalar | Computes a band hash vector for each input point based on its Euclidean LSH signature | lsh_euclidean(array, bucket_width, band_count, band_size, seed) | [SELECT lsh_euclidean(ARRAY[1.1, 2.2, 3.3, 5.8, 3.9], 0.5, 2, 3, 123);] |
| lsh_euclidean32 | scalar | Computes a band hash vector for each input point based on its Euclidean LSH signature | Reduces each band hash to 32 bits | [SELECT lsh_euclidean32(ARRAY[1.1, 2.2, 3.3, 5.8, 3.9], 0.5, 2, 3, 123);] |