Phonetic, text normalization and fuzzy matching functions for record linkage.
Maintainer(s):
RobinL
Installing and Loading
INSTALL splink_udfs FROM community;
LOAD splink_udfs;
Example
LOAD splink_udfs;
SELECT soundex(unaccent('Jürgen')); -- returns 'J625'
About splink_udfs
The splink_udfs extension provides functions for data cleaning and phonetic matching.
Includes soundex(str)
, strip_diacritics(str)
, unaccent(str)
,
ngrams(list,n)
, double_metaphone(str)
and faster versions of levenshtein
and damerau_levenshtein
.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
double_metaphone | scalar | NULL | NULL | |
ngrams | scalar | NULL | NULL | |
soundex | scalar | NULL | NULL | |
strip_diacritics | scalar | NULL | NULL | |
unaccent | scalar | NULL | NULL |
Overloaded Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
damerau_levenshtein | scalar | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Different case is considered different | NULL | [damerau_levenshtein('hello', 'world')] |
levenshtein | scalar | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Different case is considered different | NULL | [levenshtein('duck','db')] |