⌘+k ctrl+k
1.3 (stable)
Search Shortcut cmd + k | ctrl + k
Text Functions

Text Functions and Operators

This section describes functions and operators for examining and manipulating STRING values.

Name Description
string[index] Extracts a single character using a (1-based) index.
string[begin:end] Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
string LIKE target Returns true if the string matches the like specifier (see Pattern Matching).
string SIMILAR TO regex Returns true if the string matches the regex (see Pattern Matching).
string ^@ search_string Returns true if string begins with search_string.
arg1 || arg2 Concatenates two strings, lists, or blobs. Any NULL input results in NULL. See also concat(arg1, arg2, ...) and list_concat(list1, list2).
array_extract(string, index) Extracts a single character from a string using a (1-based) index.
array_slice(list, begin, end) Extracts a sublist or substring using slice conventions. Negative values are accepted.
ascii(string) Returns an integer that represents the Unicode code point of the first character of the string.
bar(x, min, max[, width]) Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.
base64(blob) Converts a blob to a base64 encoded string.
bin(string) Converts the string to binary representation.
bit_length(string) Number of bits in a string.
char_length(string) Number of characters in string.
character_length(string) Number of characters in string.
chr(code_point) Returns a character which is corresponding the ASCII code value or Unicode code point.
concat(value, ...) Concatenates multiple strings, lists, or blobs. NULL inputs are skipped. See also operator ||.
concat_ws(separator, string, ...) Concatenates many strings, separated by separator. NULL inputs are skipped.
contains(string, search_string) Returns true if search_string is found within string.
ends_with(string, search_string) Returns true if string ends with search_string.
format(format, ...) Formats a string using the fmt syntax.
formatReadableDecimalSize(integer) Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
formatReadableSize(integer) Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
format_bytes(integer) Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
from_base64(string) Converts a base64 encoded string to a character string (BLOB).
from_binary(value) Converts a value from binary representation to a blob.
from_hex(value) Converts a value from hexadecimal representation to a blob.
greatest(arg1, ...) Returns the largest value. For strings lexicographical ordering is used. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported.
hash(value, ...) Returns a UBIGINT with the hash of the value. Note that this is not a cryptographic hash.
hex(string) Converts the string to hexadecimal representation.
ilike_escape(string, like_specifier, escape_character) Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
instr(string, search_string) Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
lcase(string) Converts string to lower case.
least(arg1, ...) Returns the smallest value. For strings lexicographical ordering is used. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported.
left(string, count) Extracts the left-most count characters.
left_grapheme(string, count) Extracts the left-most count grapheme clusters.
len(string) Number of characters in string.
length(string) Number of characters in string.
length_grapheme(string) Number of grapheme clusters in string.
like_escape(string, like_specifier, escape_character) Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
lower(string) Converts string to lower case.
lpad(string, count, character) Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters.
ltrim(string[, characters]) Removes any occurrences of any of the characters from the left side of the string. characters defaults to space.
md5(string) Returns the MD5 hash of the string as a VARCHAR.
md5_number(string) Returns the MD5 hash of the string as a HUGEINT.
md5_number_lower(string) Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT.
md5_number_upper(string) Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT.
nfc_normalize(string) Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
not_ilike_escape(string, like_specifier, escape_character) Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
not_like_escape(string, like_specifier, escape_character) Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
ord(string) Returns an INTEGER representing the unicode codepoint of the first character in the string.
parse_dirname(path[, separator]) Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash.
parse_dirpath(path[, separator]) Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname. separator options: system, both_slash (default), forward_slash, backslash.
parse_filename(string[, trim_extension][, separator]) Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash.
parse_path(path[, separator]) Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash.
position(search_string IN string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
prefix(string, search_string) Returns true if string starts with search_string.
printf(format, ...) Formats a string using printf syntax.
read_text(source) Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details.
regexp_escape(string) Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function.
regexp_extract(string, regex[, group][, options]) If string contains the regex pattern, returns the capturing group specified by optional parameter group; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set.
regexp_extract(string, regex, name_list[, options]) If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set.
regexp_extract_all(string, regex[, group][, options]) Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group. A set of optional regex options can be set.
regexp_full_match(string, regex[, col2]) Returns true if the entire string matches the regex. A set of optional regex options can be set.
regexp_matches(string, regex[, options]) Returns true if string contains the regex, false otherwise. A set of optional regex options can be set.
regexp_replace(string, regex, replacement[, options]) If string contains the regex, replaces the matching part with replacement. A set of optional regex options can be set.
regexp_split_to_array(string, regex[, options]) Splits the string along the regex. A set of optional regex options can be set.
regexp_split_to_table(string, regex) Splits the string along the regex and returns a row for each part.
repeat(string, count) Repeats the string count number of times.
replace(string, source, target) Replaces any occurrences of the source with target in string.
reverse(string) Reverses the string.
right(string, count) Extract the right-most count characters.
right_grapheme(string, count) Extracts the right-most count grapheme clusters.
rpad(string, count, character) Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters.
rtrim(string[, characters]) Removes any occurrences of any of the characters from the right side of the string. characters defaults to space.
sha1(value) Returns a VARCHAR with the SHA-1 hash of the value.
sha256(value) Returns a VARCHAR with the SHA-256 hash of the value
split(string, separator) Splits the string along the separator.
split_part(string, separator, index) Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
starts_with(string, search_string) Returns true if string begins with search_string.
str_split(string, separator) Splits the string along the separator.
str_split_regex(string, regex[, options]) Splits the string along the regex. A set of optional regex options can be set.
string_split(string, separator) Splits the string along the separator.
string_split_regex(string, regex[, options]) Splits the string along the regex. A set of optional regex options can be set.
string_to_array(string, separator) Splits the string along the separator.
strip_accents(string) Strips accents from string.
strlen(string) Number of bytes in string.
strpos(string, search_string) Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
substr(string, start[, length]) Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string.
substring(string, start[, length]) Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string.
substring_grapheme(string, start[, length]) Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string.
suffix(string, search_string) Returns true if string ends with search_string.
to_base(number, radix[, min_length]) Converts number to a string in the given base radix, optionally padding with leading zeros to min_length.
to_base64(blob) Converts a blob to a base64 encoded string.
to_binary(string) Converts the string to binary representation.
to_hex(string) Converts the string to hexadecimal representation.
translate(string, from, to) Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted.
trim(string[, characters]) Removes any occurrences of any of the characters from either side of the string. characters defaults to space.
ucase(string) Converts string to upper case.
unbin(value) Converts a value from binary representation to a blob.
unhex(value) Converts a value from hexadecimal representation to a blob.
unicode(string) Returns an INTEGER representing the unicode codepoint of the first character in the string.
upper(string) Converts string to upper case.
url_decode(string) Decodes a URL from a representation using Percent-Encoding.
url_encode(string) Encodes a URL to a representation using Percent-Encoding.

string[index]

Description Extracts a single character using a (1-based) index.
Example 'DuckDB'[4]
Result k
Alias array_extract

string[begin:end]

Description Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
Example 'DuckDB'[:4]
Result Duck
Alias array_slice

string LIKE target

Description Returns true if the string matches the like specifier (see Pattern Matching).
Example 'hello' LIKE '%lo'
Result true

string SIMILAR TO regex

Description Returns true if the string matches the regex (see Pattern Matching).
Example 'hello' SIMILAR TO 'l+'
Result false
Alias regexp_full_match

string ^@ search_string

Description Returns true if string begins with search_string.
Example 'abc' ^@ 'a'
Result true
Alias starts_with

arg1 || arg2

Description Concatenates two strings, lists, or blobs. Any NULL input results in NULL. See also concat(arg1, arg2, ...) and list_concat(list1, list2).
Example 1 'Duck' || 'DB'
Result DuckDB
Example 2 [1, 2, 3] || [4, 5, 6]
Result [1, 2, 3, 4, 5, 6]
Example 3 '\xAA'::BLOB || '\xBB'::BLOB
Result \xAA\xBB

array_extract(string, index)

Description Extracts a single character from a string using a (1-based) index.
Example array_extract('DuckDB', 2)
Result u

array_slice(list, begin, end)

Description Extracts a sublist or substring using slice conventions. Negative values are accepted.
Example 1 array_slice('DuckDB', 3, 4)
Result ck
Example 2 array_slice('DuckDB', 3, NULL)
Result NULL
Example 3 array_slice('DuckDB', 0, -3)
Result Duck
Alias list_slice

ascii(string)

Description Returns an integer that represents the Unicode code point of the first character of the string.
Example ascii('Ω')
Result 937

bar(x, min, max[, width])

Description Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.
Example bar(5, 0, 20, 10)
Result ██▌

base64(blob)

Description Converts a blob to a base64 encoded string.
Example base64('A'::BLOB)
Result QQ==
Alias to_base64

bin(string)

Description Converts the string to binary representation.
Example bin('Aa')
Result 0100000101100001
Alias to_binary

bit_length(string)

Description Number of bits in a string.
Example bit_length('abc')
Result 24

char_length(string)

Description Number of characters in string.
Example char_length('Hello🦆')
Result 6
Aliases character_length, len, length

character_length(string)

Description Number of characters in string.
Example character_length('Hello🦆')
Result 6
Aliases char_length, len, length

chr(code_point)

Description Returns a character which is corresponding the ASCII code value or Unicode code point.
Example chr(65)
Result A

concat(value, ...)

Description Concatenates multiple strings, lists, or blobs. NULL inputs are skipped. See also operator ||.
Example concat('Hello', ' ', 'World')
Result Hello World

concat_ws(separator, string, ...)

Description Concatenates many strings, separated by separator. NULL inputs are skipped.
Example concat_ws(', ', 'Banana', 'Apple', 'Melon')
Result Banana, Apple, Melon

contains(string, search_string)

Description Returns true if search_string is found within string.
Example contains('abc', 'a')
Result true

ends_with(string, search_string)

Description Returns true if string ends with search_string.
Example ends_with('abc', 'bc')
Result true
Alias suffix

format(format, ...)

Description Formats a string using the fmt syntax.
Example format('Benchmark "{}" took {} seconds', 'CSV', 42)
Result Benchmark "CSV" took 42 seconds

formatReadableDecimalSize(integer)

Description Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
Example formatReadableDecimalSize(16_000)
Result 16.0 kB

formatReadableSize(integer)

Description Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
Example formatReadableSize(16_000)
Result 15.6 KiB
Alias format_bytes

format_bytes(integer)

Description Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
Example format_bytes(16_000)
Result 15.6 KiB
Alias formatReadableSize

from_base64(string)

Description Converts a base64 encoded string to a character string (BLOB).
Example from_base64('QQ==')
Result A

from_binary(value)

Description Converts a value from binary representation to a blob.
Example from_binary('0110')
Result \x06
Alias unbin

from_hex(value)

Description Converts a value from hexadecimal representation to a blob.
Example from_hex('2A')
Result *
Alias unhex

greatest(arg1, ...)

Description Returns the largest value. For strings lexicographical ordering is used. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported.
Example 1 greatest(42, 84)
Result 84
Example 2 greatest('abc', 'bcd', 'cde', 'EFG')
Result cde

hash(value, ...)

Description Returns a UBIGINT with the hash of the value. Note that this is not a cryptographic hash.
Example hash('🦆')
Result 4164431626903154684

hex(string)

Description Converts the string to hexadecimal representation.
Example hex('Hello')
Result 48656C6C6F
Alias to_hex

ilike_escape(string, like_specifier, escape_character)

Description Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
Example ilike_escape('A%c', 'a$%C', '$')
Result true

instr(string, search_string)

Description Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
Example instr('test test', 'es')
Result 2
Aliases position, strpos

lcase(string)

Description Converts string to lower case.
Example lcase('Hello')
Result hello
Alias lower

least(arg1, ...)

Description Returns the smallest value. For strings lexicographical ordering is used. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported.
Example 1 least(42, 84)
Result 42
Example 2 least('abc', 'bcd', 'cde', 'EFG')
Result EFG

left(string, count)

Description Extracts the left-most count characters.
Example left('Hello🦆', 2)
Result He

left_grapheme(string, count)

Description Extracts the left-most count grapheme clusters.
Example left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)
Result 🤦🏼‍♂️

len(string)

Description Number of characters in string.
Example length('Hello🦆')
Result 6
Aliases char_length, character_length, length

length(string)

Description Number of characters in string.
Example length('Hello🦆')
Result 6
Aliases char_length, character_length, len

length_grapheme(string)

Description Number of grapheme clusters in string.
Example length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')
Result 2

like_escape(string, like_specifier, escape_character)

Description Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
Example like_escape('a%c', 'a$%c', '$')
Result true

lower(string)

Description Converts string to lower case.
Example lower('Hello')
Result hello
Alias lcase

lpad(string, count, character)

Description Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters.
Example lpad('hello', 8, '>')
Result >>>hello

ltrim(string[, characters])

Description Removes any occurrences of any of the characters from the left side of the string. characters defaults to space.
Example 1 ltrim(' test ')
Result test
Example 2 ltrim('>>>>test<<', '><')
Result test<<

md5(string)

Description Returns the MD5 hash of the string as a VARCHAR.
Example md5('abc')
Result 900150983cd24fb0d6963f7d28e17f72

md5_number(string)

Description Returns the MD5 hash of the string as a HUGEINT.
Example md5_number('abc')
Result 152195979970564155685860391459828531600

md5_number_lower(string)

Description Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT.
Example md5_number_lower('abc')
Result 8250560606382298838

md5_number_upper(string)

Description Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT.
Example md5_number_upper('abc')
Result 12704604231530709392

nfc_normalize(string)

Description Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
Example nfc_normalize('ardèch')
Result ardèch

not_ilike_escape(string, like_specifier, escape_character)

Description Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
Example not_ilike_escape('A%c', 'a$%C', '$')
Result false

not_like_escape(string, like_specifier, escape_character)

Description Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
Example not_like_escape('a%c', 'a$%c', '$')
Result false

ord(string)

Description Returns an INTEGER representing the unicode codepoint of the first character in the string.
Example [unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]
Result [226, 226, -1, NULL]
Alias unicode

parse_dirname(path[, separator])

Description Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash.
Example parse_dirname('path/to/file.csv', 'system')
Result path

parse_dirpath(path[, separator])

Description Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname. separator options: system, both_slash (default), forward_slash, backslash.
Example parse_dirpath('path/to/file.csv', 'forward_slash')
Result path/to

parse_filename(string[, trim_extension][, separator])

Description Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash.
Example parse_filename('path/to/file.csv', true, 'forward_slash')
Result file

parse_path(path[, separator])

Description Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash.
Example parse_path('path/to/file.csv', 'system')
Result [path, to, file.csv]

position(search_string IN string)

Description Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
Example position('b' IN 'abc')
Result 2
Aliases instr, strpos

prefix(string, search_string)

Description Returns true if string starts with search_string.
Example prefix('abc', 'ab')
Result true

printf(format, ...)

Description Formats a string using printf syntax.
Example printf('Benchmark "%s" took %d seconds', 'CSV', 42)
Result Benchmark "CSV" took 42 seconds

read_text(source)

Description Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details.
Example read_text('hello.txt')
Result hello\n

regexp_escape(string)

Description Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function.
Example regexp_escape('https://duckdb.org')
Result https\:\/\/duckdb\.org

regexp_extract(string, regex[, group][, options])

Description If string contains the regex pattern, returns the capturing group specified by optional parameter group; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set.
Example regexp_extract('ABC', '([a-z])(b)', 1, 'i')
Result A

regexp_extract(string, regex, name_list[, options])

Description If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set.
Example regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i')
Result {'first_name': John, 'last_name': Doe}

regexp_extract_all(string, regex[, group][, options])

Description Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group. A set of optional regex options can be set.
Example regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)
Result [33, 14]

regexp_full_match(string, regex[, col2])

Description Returns true if the entire string matches the regex. A set of optional regex options can be set.
Example regexp_full_match('anabanana', '(an)*')
Result false

regexp_matches(string, regex[, options])

Description Returns true if string contains the regex, false otherwise. A set of optional regex options can be set.
Example regexp_matches('anabanana', '(an)*')
Result true

regexp_replace(string, regex, replacement[, options])

Description If string contains the regex, replaces the matching part with replacement. A set of optional regex options can be set.
Example regexp_replace('hello', '[lo]', '-')
Result he-lo

regexp_split_to_array(string, regex[, options])

Description Splits the string along the regex. A set of optional regex options can be set.
Example regexp_split_to_array('hello world; 42', ';? ')
Result [hello, world, 42]
Aliases str_split_regex, string_split_regex

regexp_split_to_table(string, regex)

Description Splits the string along the regex and returns a row for each part.
Example regexp_split_to_table('hello world; 42', ';? ')
Result Multiple rows: 'hello', 'world', '42'

repeat(string, count)

Description Repeats the string count number of times.
Example repeat('A', 5)
Result AAAAA

replace(string, source, target)

Description Replaces any occurrences of the source with target in string.
Example replace('hello', 'l', '-')
Result he--o

reverse(string)

Description Reverses the string.
Example reverse('hello')
Result olleh

right(string, count)

Description Extract the right-most count characters.
Example right('Hello🦆', 3)
Result lo🦆

right_grapheme(string, count)

Description Extracts the right-most count grapheme clusters.
Example right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)
Result 🤦🏽‍♀️

rpad(string, count, character)

Description Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters.
Example rpad('hello', 10, '<')
Result hello<<<<<

rtrim(string[, characters])

| Description | Removes any occurrences of any of the characters from the right side of the string. characters defaults to space. | | Example 1 | rtrim(' test ') | | Result | ` test | | **Example 2** | rtrim('»»test«', '><') | | **Result** | »»test` |

sha1(value)

Description Returns a VARCHAR with the SHA-1 hash of the value.
Example sha1('🦆')
Result 949bf843dc338be348fb9525d1eb535d31241d76

sha256(value)

Description Returns a VARCHAR with the SHA-256 hash of the value
Example sha256('🦆')
Result d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb

split(string, separator)

Description Splits the string along the separator.
Example split('hello-world', '-')
Result [hello, world]
Aliases str_split, string_split, string_to_array

split_part(string, separator, index)

Description Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
Example split_part('a;b;c', ';', 2)
Result b

starts_with(string, search_string)

Description Returns true if string begins with search_string.
Example starts_with('abc', 'a')
Result true
Alias ^@

str_split(string, separator)

Description Splits the string along the separator.
Example str_split('hello-world', '-')
Result [hello, world]
Aliases split, string_split, string_to_array

str_split_regex(string, regex[, options])

Description Splits the string along the regex. A set of optional regex options can be set.
Example str_split_regex('hello world; 42', ';? ')
Result [hello, world, 42]
Aliases regexp_split_to_array, string_split_regex

string_split(string, separator)

Description Splits the string along the separator.
Example string_split('hello-world', '-')
Result [hello, world]
Aliases split, str_split, string_to_array

string_split_regex(string, regex[, options])

Description Splits the string along the regex. A set of optional regex options can be set.
Example string_split_regex('hello world; 42', ';? ')
Result [hello, world, 42]
Aliases regexp_split_to_array, str_split_regex

string_to_array(string, separator)

Description Splits the string along the separator.
Example string_to_array('hello-world', '-')
Result [hello, world]
Aliases split, str_split, string_split

strip_accents(string)

Description Strips accents from string.
Example strip_accents('mühleisen')
Result muhleisen

strlen(string)

Description Number of bytes in string.
Example strlen('🦆')
Result 4

strpos(string, search_string)

Description Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
Example strpos('test test', 'es')
Result 2
Aliases instr, position

substr(string, start[, length])

Description Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string.
Example 1 substring('Hello', 2)
Result ello
Example 2 substring('Hello', 2, 2)
Result el
Alias substring

substring(string, start[, length])

Description Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string.
Example 1 substring('Hello', 2)
Result ello
Example 2 substring('Hello', 2, 2)
Result el
Alias substr

substring_grapheme(string, start[, length])

Description Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string.
Example 1 substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3)
Result 🤦🏽‍♀️🦆
Example 2 substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)
Result 🤦🏽‍♀️🦆

suffix(string, search_string)

Description Returns true if string ends with search_string.
Example suffix('abc', 'bc')
Result true
Alias ends_with

to_base(number, radix[, min_length])

Description Converts number to a string in the given base radix, optionally padding with leading zeros to min_length.
Example to_base(42, 16, 5)
Result 0002A

to_base64(blob)

Description Converts a blob to a base64 encoded string.
Example to_base64('A'::BLOB)
Result QQ==
Alias base64

to_binary(string)

Description Converts the string to binary representation.
Example to_binary('Aa')
Result 0100000101100001
Alias bin

to_hex(string)

Description Converts the string to hexadecimal representation.
Example to_hex('Hello')
Result 48656C6C6F
Alias hex

translate(string, from, to)

Description Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted.
Example translate('12345', '143', 'ax')
Result a2x5

trim(string[, characters])

Description Removes any occurrences of any of the characters from either side of the string. characters defaults to space.
Example 1 trim(' test ')
Result test
Example 2 trim('>>>>test<<', '><')
Result test

ucase(string)

Description Converts string to upper case.
Example ucase('Hello')
Result HELLO
Alias upper

unbin(value)

Description Converts a value from binary representation to a blob.
Example unbin('0110')
Result \x06
Alias from_binary

unhex(value)

Description Converts a value from hexadecimal representation to a blob.
Example unhex('2A')
Result *
Alias from_hex

unicode(string)

Description Returns an INTEGER representing the unicode codepoint of the first character in the string.
Example [unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]
Result [226, 226, -1, NULL]
Alias ord

upper(string)

Description Converts string to upper case.
Example upper('Hello')
Result HELLO
Alias ucase

url_decode(string)

Description Decodes a URL from a representation using Percent-Encoding.
Example url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')
Result https://duckdb.org/why_duckdb#portable

url_encode(string)

Description Encodes a URL to a representation using Percent-Encoding.
Example url_encode('this string has/ special+ characters>')
Result this%20string%20has%2F%20special%2B%20characters%3E

Text Similarity Functions

These functions are used to measure the similarity of two strings using various similarity measures.

Name Description
damerau_levenshtein(s1, s2) Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A) are considered different.
editdist3(s1, s2) The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different.
hamming(s1, s2) The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different.
jaccard(s1, s2) The Jaccard similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1.
jaro_similarity(s1, s2[, score_cutoff]) The Jaro similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
jaro_winkler_similarity(s1, s2[, score_cutoff]) The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
levenshtein(s1, s2) The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different.
mismatches(s1, s2) The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different.

damerau_levenshtein(s1, s2)

Description Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A) are considered different.
Example damerau_levenshtein('duckdb', 'udckbd')
Result 2

editdist3(s1, s2)

Description The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different.
Example editdist3('duck', 'db')
Result 3
Alias levenshtein

hamming(s1, s2)

Description The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different.
Example hamming('duck', 'luck')
Result 1
Alias mismatches

jaccard(s1, s2)

Description The Jaccard similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1.
Example jaccard('duck', 'luck')
Result 0.6

jaro_similarity(s1, s2[, score_cutoff])

Description The Jaro similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
Example jaro_similarity('duck', 'duckdb')
Result 0.8888888888888888

jaro_winkler_similarity(s1, s2[, score_cutoff])

Description The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
Example jaro_winkler_similarity('duck', 'duckdb')
Result 0.9333333333333333

levenshtein(s1, s2)

Description The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different.
Example levenshtein('duck', 'db')
Result 3
Alias editdist3

mismatches(s1, s2)

Description The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different.
Example mismatches('duck', 'luck')
Result 1
Alias hamming

Formatters

fmt Syntax

The format(format, parameters...) function formats strings, loosely following the syntax of the {fmt} open-source formatting library.

Format without additional parameters:

SELECT format('Hello world'); -- Hello world

Format a string using {}:

SELECT format('The answer is {}', 42); -- The answer is 42

Format a string using positional arguments:

SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right.

Format Specifiers

Specifier Description Example
{:d} integer 654321
{:E} scientific notation 3.141593E+00
{:f} float 4.560000
{:o} octal 2375761
{:s} string asd
{:x} hexadecimal 9fbf1
{:tX} integer, X is the thousand separator 654 321

Formatting Types

Integers:

SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8

Booleans:

SELECT format('{} != {}', true, false); -- true != false

Format datetime values:

SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01
SELECT format('{}', TIME '12:01:00'); -- 12:01:00
SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00

Format BLOB:

SELECT format('{}', BLOB '\x00hello'); -- \x00hello

Pad integers with 0s:

SELECT format('{:04d}', 33); -- 0033

Create timestamps from integers:

SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM

Convert to hexadecimal:

SELECT format('{:x}', 123_456_789); -- 75bcd15

Convert to binary:

SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101

Integers:

SELECT format('{:,}',  123_456_789); -- 123,456,789
SELECT format('{:t.}', 123_456_789); -- 123.456.789
SELECT format('{:''}', 123_456_789); -- 123'456'789
SELECT format('{:_}',  123_456_789); -- 123_456_789
SELECT format('{:t }', 123_456_789); -- 123 456 789
SELECT format('{:tX}', 123_456_789); -- 123X456X789

Float, double and decimal:

SELECT format('{:,f}',    123456.789); -- 123,456.78900
SELECT format('{:,.2f}',  123456.789); -- 123,456.79
SELECT format('{:t..2f}', 123456.789); -- 123.456,79

printf Syntax

The printf(format, parameters...) function formats strings using the printf syntax.

Format without additional parameters:

SELECT printf('Hello world');
Hello world

Format a string using arguments in a given order:

SELECT printf('The answer to %s is %d', 'life', 42);
The answer to life is 42

Format a string using positional arguments %position$formatter, e.g., the second parameter as a string is encoded as %2$s:

SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');
I'd rather be happy than right.

Format Specifiers

Specifier Description Example
%c character code to character a
%d integer 654321
%Xd integer with thousand seperarator X from ,, ., '', _ 654_321
%E scientific notation 3.141593E+00
%f float 4.560000
%hd integer 654321
%hhd integer 654321
%lld integer 654321
%o octal 2375761
%s string asd
%x hexadecimal 9fbf1

Formatting Types

Integers:

SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8

Booleans:

SELECT printf('%s != %s', true, false); -- true != false

Format datetime values:

SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01
SELECT printf('%s', TIME '12:01:00'); -- 12:01:00
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00

Format BLOB:

SELECT printf('%s', BLOB '\x00hello'); -- \x00hello

Pad integers with 0s:

SELECT printf('%04d', 33); -- 0033

Create timestamps from integers:

SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM

Convert to hexadecimal:

SELECT printf('%x', 123_456_789); -- 75bcd15

Convert to binary:

SELECT printf('%b', 123_456_789); -- 111010110111100110100010101

Thousand Separators

Integers:

SELECT printf('%,d',  123_456_789); -- 123,456,789
SELECT printf('%.d',  123_456_789); -- 123.456.789
SELECT printf('%''d', 123_456_789); -- 123'456'789
SELECT printf('%_d',  123_456_789); -- 123_456_789

Float, double and decimal:

SELECT printf('%,f',   123456.789); -- 123,456.789000
SELECT printf('%,.2f', 123456.789); -- 123,456.79