Search Shortcut cmd + k | ctrl + k
lsh

Extension for locality-sensitive hashing (LSH)

Maintainer(s): yoonspark, ericmanning

Installing and Loading

INSTALL lsh FROM community;
LOAD lsh;

Example

-- Install and load the extension
INSTALL lsh FROM community;
LOAD lsh;

-- Create toy data
CREATE TEMPORARY TABLE temp_names AS
SELECT * FROM (
    VALUES
        ('Alice Johnson'),
        ('Robert Smith'),
        (NULL),
        ('Charlotte Brown'),
        ('David Martinez'),
        ('Emily Davis'),
        ('Michael Wilson'),
        ('Sophia Taylor'),
        (NULL),
        ('James Anderson'),
        ('Olivia Thomas'),
        ('Benjamin Lee')
) AS t(name);

-- Apply MinHash
SELECT lsh_min(name, 2, 3, 2, 123) AS hash FROM temp_names;

About lsh

For more information regarding usage, see the documentation.

Added Functions

function_name function_type description comment examples
lsh_min scalar Computes a band hash vector for each input string based on its MinHash signature lsh_min(string, ngram_width, band_count, band_size, seed) [SELECT lsh_min('Princeton University', 2, 3, 2, 123);]
lsh_min32 scalar Computes a band hash vector for each input string based on its MinHash signature Reduces each band hash to 32 bits [SELECT lsh_min32('Princeton University', 2, 3, 2, 123);]
lsh_euclidean scalar Computes a band hash vector for each input point based on its Euclidean LSH signature lsh_euclidean(array, bucket_width, band_count, band_size, seed) [SELECT lsh_euclidean(ARRAY[1.1, 2.2, 3.3, 5.8, 3.9], 0.5, 2, 3, 123);]
lsh_euclidean32 scalar Computes a band hash vector for each input point based on its Euclidean LSH signature Reduces each band hash to 32 bits [SELECT lsh_euclidean32(ARRAY[1.1, 2.2, 3.3, 5.8, 3.9], 0.5, 2, 3, 123);]