Text Functions

Search Shortcut cmd + k | ctrl + k

Documentation / SQL / Functions

Text Functions

Text Functions and Operators

This section describes functions and operators for examining and manipulating STRING values.

Name	Description
`string[index]`	Extracts a single character using a (1-based) index.
`string[begin:end]`	Extracts a string using slice conventions similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
`string LIKE target`	Returns `true` if the `string` matches the like specifier (see Pattern Matching).
`string SIMILAR TO regex`	Returns `true` if the `string` matches the `regex` (see Pattern Matching).
`string ^@ search_string`	Returns `true` if `string` begins with `search_string`.
`arg1 \|\| arg2`	Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also `concat(arg1, arg2, ...)` and `list_concat(list1, list2)`.
`array_extract(string, index)`	Extracts a single character from a `string` using a (1-based) `index`.
`array_slice(list, begin, end)`	Extracts a sublist or substring using slice conventions. Negative values are accepted.
`ascii(string)`	Returns an integer that represents the Unicode code point of the first character of the `string`.
`bar(x, min, max[, width])`	Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80.
`base64(blob)`	Converts a `blob` to a base64 encoded string.
`bin(string)`	Converts the `string` to binary representation.
`bit_length(string)`	Number of bits in a `string`.
`char_length(string)`	Number of characters in `string`.
`character_length(string)`	Number of characters in `string`.
`chr(code_point)`	Returns a character which is corresponding the ASCII code value or Unicode code point.
`concat(value, ...)`	Concatenates multiple strings, lists, or blobs. `NULL` inputs are skipped. See also operator `\|\|`.
`concat_ws(separator, string, ...)`	Concatenates many strings, separated by `separator`. `NULL` inputs are skipped.
`contains(string, search_string)`	Returns `true` if `search_string` is found within `string`. Note that collations are not supported.
`ends_with(string, search_string)`	Returns `true` if `string` ends with `search_string`.
`format(format, ...)`	Formats a string using the fmt syntax.
`formatReadableDecimalSize(integer)`	Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
`formatReadableSize(integer)`	Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
`format_bytes(integer)`	Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
`from_base64(string)`	Converts a base64 encoded `string` to a character string (`BLOB`).
`from_binary(value)`	Converts a `value` from binary representation to a blob.
`from_hex(value)`	Converts a `value` from hexadecimal representation to a blob.
`greatest(arg1, ...)`	Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported.
`hash(value, ...)`	Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash.
`hex(string)`	Converts the `string` to hexadecimal representation.
`ilike_escape(string, like_specifier, escape_character)`	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`instr(string, search_string)`	Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
`lcase(string)`	Converts `string` to lower case.
`least(arg1, ...)`	Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported.
`left(string, count)`	Extracts the left-most count characters.
`left_grapheme(string, count)`	Extracts the left-most count grapheme clusters.
`len(string)`	Number of characters in `string`.
`length(string)`	Number of characters in `string`.
`length_grapheme(string)`	Number of grapheme clusters in `string`.
`like_escape(string, like_specifier, escape_character)`	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`lower(string)`	Converts `string` to lower case.
`lpad(string, count, character)`	Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
`ltrim(string[, characters])`	Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`.
`md5(string)`	Returns the MD5 hash of the `string` as a `VARCHAR`.
`md5_number(string)`	Returns the MD5 hash of the `string` as a `HUGEINT`.
`md5_number_lower(string)`	Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
`md5_number_upper(string)`	Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
`nfc_normalize(string)`	Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
`not_ilike_escape(string, like_specifier, escape_character)`	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`not_like_escape(string, like_specifier, escape_character)`	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`ord(string)`	Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.
`parse_dirname(path[, separator])`	Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_dirpath(path[, separator])`	Returns the head of the `path` (the pathname until the last slash) similarly to Python's `os.path.dirname`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_filename(string[, trim_extension][, separator])`	Returns the last component of the `path` similarly to Python's `os.path.basename` function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_path(path[, separator])`	Returns a list of the components (directories and filename) in the `path` similarly to Python's `pathlib.parts` function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`position(search_string IN string)`	Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
`prefix(string, search_string)`	Returns `true` if `string` starts with `search_string`.
`printf(format, ...)`	Formats a `string` using printf syntax.
`read_text(source)`	Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the `read_text` guide for more details.
`regexp_escape(string)`	Escapes special patterns to turn `string` into a regular expression similarly to Python's `re.escape` function.
`regexp_extract(string, regex[, group][, options])`	If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional regex `options` can be set.
`regexp_extract(string, regex, name_list[, options])`	If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex `options` can be set.
`regexp_extract_all(string, regex[, group][, options])`	Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional regex `options` can be set.
`regexp_full_match(string, regex[, col2])`	Returns `true` if the entire `string` matches the `regex`. A set of optional regex `options` can be set.
`regexp_matches(string, regex[, options])`	Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional regex `options` can be set.
`regexp_replace(string, regex, replacement[, options])`	If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional regex `options` can be set.
`regexp_split_to_array(string, regex[, options])`	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
`regexp_split_to_table(string, regex)`	Splits the `string` along the `regex` and returns a row for each part.
`repeat(string, count)`	Repeats the `string` `count` number of times.
`replace(string, source, target)`	Replaces any occurrences of the `source` with `target` in `string`.
`reverse(string)`	Reverses the `string`.
`right(string, count)`	Extract the right-most `count` characters.
`right_grapheme(string, count)`	Extracts the right-most `count` grapheme clusters.
`rpad(string, count, character)`	Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
`rtrim(string[, characters])`	Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`.
`sha1(value)`	Returns a `VARCHAR` with the SHA-1 hash of the `value`.
`sha256(value)`	Returns a `VARCHAR` with the SHA-256 hash of the `value`
`split(string, separator)`	Splits the `string` along the `separator`.
`split_part(string, separator, index)`	Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
`starts_with(string, search_string)`	Returns `true` if `string` begins with `search_string`.
`str_split(string, separator)`	Splits the `string` along the `separator`.
`str_split_regex(string, regex[, options])`	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
`string_split(string, separator)`	Splits the `string` along the `separator`.
`string_split_regex(string, regex[, options])`	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
`string_to_array(string, separator)`	Splits the `string` along the `separator`.
`strip_accents(string)`	Strips accents from `string`.
`strlen(string)`	Number of bytes in `string`.
`strpos(string, search_string)`	Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
`substr(string, start[, length])`	Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`.
`substring(string, start[, length])`	Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`.
`substring_grapheme(string, start[, length])`	Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`.
`suffix(string, search_string)`	Returns `true` if `string` ends with `search_string`.
`to_base(number, radix[, min_length])`	Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`.
`to_base64(blob)`	Converts a `blob` to a base64 encoded string.
`to_binary(string)`	Converts the `string` to binary representation.
`to_hex(string)`	Converts the `string` to hexadecimal representation.
`translate(string, from, to)`	Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted.
`trim(string[, characters])`	Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`.
`ucase(string)`	Converts `string` to upper case.
`unbin(value)`	Converts a `value` from binary representation to a blob.
`unhex(value)`	Converts a `value` from hexadecimal representation to a blob.
`unicode(string)`	Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.
`upper(string)`	Converts `string` to upper case.
`url_decode(string)`	Decodes a URL from a representation using Percent-Encoding.
`url_encode(string)`	Encodes a URL to a representation using Percent-Encoding.

`string[index]`

Description	Extracts a single character using a (1-based) index.
Example	`'DuckDB'[4]`
Result	`k`
Alias	`array_extract`

`string[begin:end]`

Description	Extracts a string using slice conventions similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
Example	`'DuckDB'[:4]`
Result	`Duck`
Alias	`array_slice`

`string LIKE target`

Description	Returns `true` if the `string` matches the like specifier (see Pattern Matching).
Example	`'hello' LIKE '%lo'`
Result	`true`

`string SIMILAR TO regex`

Description	Returns `true` if the `string` matches the `regex` (see Pattern Matching).
Example	`'hello' SIMILAR TO 'l+'`
Result	`false`
Alias	`regexp_full_match`

`string ^@ search_string`

Description	Returns `true` if `string` begins with `search_string`.
Example	`'abc' ^@ 'a'`
Result	`true`
Alias	`starts_with`

`arg1 || arg2`

Description	Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also `concat(arg1, arg2, ...)` and `list_concat(list1, list2)`.
Example 1	`'Duck' \|\| 'DB'`
Result	`DuckDB`
Example 2	`[1, 2, 3] \|\| [4, 5, 6]`
Result	`[1, 2, 3, 4, 5, 6]`
Example 3	`'\xAA'::BLOB \|\| '\xBB'::BLOB`
Result	`\xAA\xBB`

`array_extract(string, index)`

Description	Extracts a single character from a `string` using a (1-based) `index`.
Example	`array_extract('DuckDB', 2)`
Result	`u`

`array_slice(list, begin, end)`

Description	Extracts a sublist or substring using slice conventions. Negative values are accepted.
Example 1	`array_slice('DuckDB', 3, 4)`
Result	`ck`
Example 2	`array_slice('DuckDB', 3, NULL)`
Result	`NULL`
Example 3	`array_slice('DuckDB', 0, -3)`
Result	`Duck`
Alias	`list_slice`

`ascii(string)`

Description	Returns an integer that represents the Unicode code point of the first character of the `string`.
Example	`ascii('Ω')`
Result	`937`

`bar(x, min, max[, width])`

Description	Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80.
Example	`bar(5, 0, 20, 10)`
Result	`██▌`

`base64(blob)`

Description	Converts a `blob` to a base64 encoded string.
Example	`base64('A'::BLOB)`
Result	`QQ==`
Alias	`to_base64`

`bin(string)`

Description	Converts the `string` to binary representation.
Example	`bin('Aa')`
Result	`0100000101100001`
Alias	`to_binary`

`bit_length(string)`

Description	Number of bits in a `string`.
Example	`bit_length('abc')`
Result	`24`

`char_length(string)`

Description	Number of characters in `string`.
Example	`char_length('Hello🦆')`
Result	`6`
Aliases	`character_length`, `len`, `length`

`character_length(string)`

Description	Number of characters in `string`.
Example	`character_length('Hello🦆')`
Result	`6`
Aliases	`char_length`, `len`, `length`

`chr(code_point)`

Description	Returns a character which is corresponding the ASCII code value or Unicode code point.
Example	`chr(65)`
Result	`A`

`concat(value, ...)`

Description	Concatenates multiple strings, lists, or blobs. `NULL` inputs are skipped. See also operator `\|\|`.
Example	`concat('Hello', ' ', 'World')`
Result	`Hello World`

`concat_ws(separator, string, ...)`

Description	Concatenates many strings, separated by `separator`. `NULL` inputs are skipped.
Example	`concat_ws(', ', 'Banana', 'Apple', 'Melon')`
Result	`Banana, Apple, Melon`

`contains(string, search_string)`

Description	Returns `true` if `search_string` is found within `string`.
Example	`contains('abc', 'a')`
Result	`true`

`ends_with(string, search_string)`

Description	Returns `true` if `string` ends with `search_string`. Note that collations are not supported.
Example	`ends_with('abc', 'bc')`
Result	`true`
Alias	`suffix`

`format(format, ...)`

Description	Formats a string using the fmt syntax.
Example	`format('Benchmark "{}" took {} seconds', 'CSV', 42)`
Result	`Benchmark "CSV" took 42 seconds`

`formatReadableDecimalSize(integer)`

Description	Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
Example	`formatReadableDecimalSize(16_000)`
Result	`16.0 kB`

`formatReadableSize(integer)`

Description	Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
Example	`formatReadableSize(16_000)`
Result	`15.6 KiB`
Alias	`format_bytes`

`format_bytes(integer)`

Description	Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
Example	`format_bytes(16_000)`
Result	`15.6 KiB`
Alias	`formatReadableSize`

`from_base64(string)`

Description	Converts a base64 encoded `string` to a character string (`BLOB`).
Example	`from_base64('QQ==')`
Result	`A`

`from_binary(value)`

Description	Converts a `value` from binary representation to a blob.
Example	`from_binary('0110')`
Result	`\x06`
Alias	`unbin`

`from_hex(value)`

Description	Converts a `value` from hexadecimal representation to a blob.
Example	`from_hex('2A')`
Result	`*`
Alias	`unhex`

`greatest(arg1, ...)`

Description	Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported.
Example 1	`greatest(42, 84)`
Result	`84`
Example 2	`greatest('abc', 'bcd', 'cde', 'EFG')`
Result	`cde`

`hash(value, ...)`

Description	Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash.
Example	`hash('🦆')`
Result	`4164431626903154684`

`hex(string)`

Description	Converts the `string` to hexadecimal representation.
Example	`hex('Hello')`
Result	`48656C6C6F`
Alias	`to_hex`

`ilike_escape(string, like_specifier, escape_character)`

Description	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
Example	`ilike_escape('A%c', 'a$%C', '$')`
Result	`true`

`instr(string, search_string)`

Description	Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
Example	`instr('test test', 'es')`
Result	`2`
Aliases	`position`, `strpos`

`lcase(string)`

Description	Converts `string` to lower case.
Example	`lcase('Hello')`
Result	`hello`
Alias	`lower`

`least(arg1, ...)`

Description	Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported.
Example 1	`least(42, 84)`
Result	`42`
Example 2	`least('abc', 'bcd', 'cde', 'EFG')`
Result	`EFG`

`left(string, count)`

Description	Extracts the left-most count characters.
Example	`left('Hello🦆', 2)`
Result	`He`

`left_grapheme(string, count)`

Description	Extracts the left-most count grapheme clusters.
Example	`left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)`
Result	`🤦🏼‍♂️`

`len(string)`

Description	Number of characters in `string`.
Example	`length('Hello🦆')`
Result	`6`
Aliases	`char_length`, `character_length`, `length`

`length(string)`

Description	Number of characters in `string`.
Example	`length('Hello🦆')`
Result	`6`
Aliases	`char_length`, `character_length`, `len`

`length_grapheme(string)`

Description	Number of grapheme clusters in `string`.
Example	`length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')`
Result	`2`

`like_escape(string, like_specifier, escape_character)`

Description	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
Example	`like_escape('a%c', 'a$%c', '$')`
Result	`true`

`lower(string)`

Description	Converts `string` to lower case.
Example	`lower('Hello')`
Result	`hello`
Alias	`lcase`

`lpad(string, count, character)`

Description	Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
Example	`lpad('hello', 8, '>')`
Result	`>>>hello`

`ltrim(string[, characters])`

Description	Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`.
Example 1	`ltrim(' test ')`
Result	`test`
Example 2	`ltrim('>>>>test<<', '><')`
Result	`test<<`

`md5(string)`

Description	Returns the MD5 hash of the `string` as a `VARCHAR`.
Example	`md5('abc')`
Result	`900150983cd24fb0d6963f7d28e17f72`

`md5_number(string)`

Description	Returns the MD5 hash of the `string` as a `HUGEINT`.
Example	`md5_number('abc')`
Result	`152195979970564155685860391459828531600`

`md5_number_lower(string)`

Description	Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
Example	`md5_number_lower('abc')`
Result	`8250560606382298838`

`md5_number_upper(string)`

Description	Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
Example	`md5_number_upper('abc')`
Result	`12704604231530709392`

`nfc_normalize(string)`

Description	Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
Example	`nfc_normalize('ardèch')`
Result	`ardèch`

`not_ilike_escape(string, like_specifier, escape_character)`

Description	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
Example	`not_ilike_escape('A%c', 'a$%C', '$')`
Result	`false`

`not_like_escape(string, like_specifier, escape_character)`

Description	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
Example	`not_like_escape('a%c', 'a$%c', '$')`
Result	`false`

`ord(string)`

Description	Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.
Example	`[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]`
Result	`[226, 226, -1, NULL]`
Alias	`unicode`

`parse_dirname(path[, separator])`

Description	Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
Example	`parse_dirname('path/to/file.csv', 'system')`
Result	`path`

`parse_dirpath(path[, separator])`

Description	Returns the head of the `path` (the pathname until the last slash) similarly to Python's `os.path.dirname`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
Example	`parse_dirpath('path/to/file.csv', 'forward_slash')`
Result	`path/to`

`parse_filename(string[, trim_extension][, separator])`

Description	Returns the last component of the `path` similarly to Python's `os.path.basename` function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
Example	`parse_filename('path/to/file.csv', true, 'forward_slash')`
Result	`file`

`parse_path(path[, separator])`

Description	Returns a list of the components (directories and filename) in the `path` similarly to Python's `pathlib.parts` function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
Example	`parse_path('path/to/file.csv', 'system')`
Result	`[path, to, file.csv]`

`position(search_string IN string)`

Description	Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
Example	`position('b' IN 'abc')`
Result	`2`
Aliases	`instr`, `strpos`

`prefix(string, search_string)`

Description	Returns `true` if `string` starts with `search_string`.
Example	`prefix('abc', 'ab')`
Result	`true`

`printf(format, ...)`

Description	Formats a `string` using printf syntax.
Example	`printf('Benchmark "%s" took %d seconds', 'CSV', 42)`
Result	`Benchmark "CSV" took 42 seconds`

`read_text(source)`

Description	Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the `read_text` guide for more details.
Example	`read_text('hello.txt')`
Result	`hello\n`

`regexp_escape(string)`

Description	Escapes special patterns to turn `string` into a regular expression similarly to Python's `re.escape` function.
Example	`regexp_escape('https://duckdb.org')`
Result	`https\:\/\/duckdb\.org`

`regexp_extract(string, regex[, group][, options])`

Description	If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional regex `options` can be set.
Example	`regexp_extract('ABC', '([a-z])(b)', 1, 'i')`
Result	`A`

`regexp_extract(string, regex, name_list[, options])`

Description	If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex `options` can be set.
Example	`regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i')`
Result	`{'first_name': John, 'last_name': Doe}`

`regexp_extract_all(string, regex[, group][, options])`

Description	Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional regex `options` can be set.
Example	`regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)`
Result	`[33, 14]`

`regexp_full_match(string, regex[, col2])`

Description	Returns `true` if the entire `string` matches the `regex`. A set of optional regex `options` can be set.
Example	`regexp_full_match('anabanana', '(an)*')`
Result	`false`

`regexp_matches(string, regex[, options])`

Description	Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional regex `options` can be set.
Example	`regexp_matches('anabanana', '(an)*')`
Result	`true`

`regexp_replace(string, regex, replacement[, options])`

Description	If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional regex `options` can be set.
Example	`regexp_replace('hello', '[lo]', '-')`
Result	`he-lo`

`regexp_split_to_array(string, regex[, options])`

Description	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
Example	`regexp_split_to_array('hello world; 42', ';? ')`
Result	`[hello, world, 42]`
Aliases	`str_split_regex`, `string_split_regex`

`regexp_split_to_table(string, regex)`

Description	Splits the `string` along the `regex` and returns a row for each part.
Example	`regexp_split_to_table('hello world; 42', ';? ')`
Result	Multiple rows: `'hello'`, `'world'`, `'42'`

`repeat(string, count)`

Description	Repeats the `string` `count` number of times.
Example	`repeat('A', 5)`
Result	`AAAAA`

`replace(string, source, target)`

Description	Replaces any occurrences of the `source` with `target` in `string`.
Example	`replace('hello', 'l', '-')`
Result	`he--o`

`reverse(string)`

Description	Reverses the `string`.
Example	`reverse('hello')`
Result	`olleh`

`right(string, count)`

Description	Extract the right-most `count` characters.
Example	`right('Hello🦆', 3)`
Result	`lo🦆`

`right_grapheme(string, count)`

Description	Extracts the right-most `count` grapheme clusters.
Example	`right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)`
Result	`🤦🏽‍♀️`

`rpad(string, count, character)`

Description	Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
Example	`rpad('hello', 10, '<')`
Result	`hello<<<<<`

`sha1(value)`

Description	Returns a `VARCHAR` with the SHA-1 hash of the `value`.
Example	`sha1('🦆')`
Result	`949bf843dc338be348fb9525d1eb535d31241d76`

`sha256(value)`

Description	Returns a `VARCHAR` with the SHA-256 hash of the `value`
Example	`sha256('🦆')`
Result	`d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb`

`split(string, separator)`

Description	Splits the `string` along the `separator`.
Example	`split('hello-world', '-')`
Result	`[hello, world]`
Aliases	`str_split`, `string_split`, `string_to_array`

`split_part(string, separator, index)`

Description	Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
Example	`split_part('a;b;c', ';', 2)`
Result	`b`

`starts_with(string, search_string)`

Description	Returns `true` if `string` begins with `search_string`.
Example	`starts_with('abc', 'a')`
Result	`true`
Alias	`^@`

`str_split(string, separator)`

Description	Splits the `string` along the `separator`.
Example	`str_split('hello-world', '-')`
Result	`[hello, world]`
Aliases	`split`, `string_split`, `string_to_array`

`str_split_regex(string, regex[, options])`

Description	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
Example	`str_split_regex('hello world; 42', ';? ')`
Result	`[hello, world, 42]`
Aliases	`regexp_split_to_array`, `string_split_regex`

`string_split(string, separator)`

Description	Splits the `string` along the `separator`.
Example	`string_split('hello-world', '-')`
Result	`[hello, world]`
Aliases	`split`, `str_split`, `string_to_array`

`string_split_regex(string, regex[, options])`

Description	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
Example	`string_split_regex('hello world; 42', ';? ')`
Result	`[hello, world, 42]`
Aliases	`regexp_split_to_array`, `str_split_regex`

`string_to_array(string, separator)`

Description	Splits the `string` along the `separator`.
Example	`string_to_array('hello-world', '-')`
Result	`[hello, world]`
Aliases	`split`, `str_split`, `string_split`

`strip_accents(string)`

Description	Strips accents from `string`.
Example	`strip_accents('mühleisen')`
Result	`muhleisen`

`strlen(string)`

Description	Number of bytes in `string`.
Example	`strlen('🦆')`
Result	`4`

`strpos(string, search_string)`

Description	Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
Example	`strpos('test test', 'es')`
Result	`2`
Aliases	`instr`, `position`

`substr(string, start[, length])`

Description	Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`.
Example 1	`substring('Hello', 2)`
Result	`ello`
Example 2	`substring('Hello', 2, 2)`
Result	`el`
Alias	`substring`

`substring(string, start[, length])`

Description	Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`.
Example 1	`substring('Hello', 2)`
Result	`ello`
Example 2	`substring('Hello', 2, 2)`
Result	`el`
Alias	`substr`

`substring_grapheme(string, start[, length])`

Description	Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`.
Example 1	`substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3)`
Result	`🤦🏽‍♀️🦆`
Example 2	`substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)`
Result	`🤦🏽‍♀️🦆`

`suffix(string, search_string)`

Description	Returns `true` if `string` ends with `search_string`.
Example	`suffix('abc', 'bc')`
Result	`true`
Alias	`ends_with`

`to_base(number, radix[, min_length])`

Description	Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`.
Example	`to_base(42, 16, 5)`
Result	`0002A`

`to_base64(blob)`

Description	Converts a `blob` to a base64 encoded string.
Example	`to_base64('A'::BLOB)`
Result	`QQ==`
Alias	`base64`

`to_binary(string)`

Description	Converts the `string` to binary representation.
Example	`to_binary('Aa')`
Result	`0100000101100001`
Alias	`bin`

`to_hex(string)`

Description	Converts the `string` to hexadecimal representation.
Example	`to_hex('Hello')`
Result	`48656C6C6F`
Alias	`hex`

`translate(string, from, to)`

Description	Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted.
Example	`translate('12345', '143', 'ax')`
Result	`a2x5`

`trim(string[, characters])`

Description	Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`.
Example 1	`trim(' test ')`
Result	`test`
Example 2	`trim('>>>>test<<', '><')`
Result	`test`

`ucase(string)`

Description	Converts `string` to upper case.
Example	`ucase('Hello')`
Result	`HELLO`
Alias	`upper`

`unbin(value)`

Description	Converts a `value` from binary representation to a blob.
Example	`unbin('0110')`
Result	`\x06`
Alias	`from_binary`

`unhex(value)`

Description	Converts a `value` from hexadecimal representation to a blob.
Example	`unhex('2A')`
Result	`*`
Alias	`from_hex`

`unicode(string)`

Description	Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.
Example	`[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]`
Result	`[226, 226, -1, NULL]`
Alias	`ord`

`upper(string)`

Description	Converts `string` to upper case.
Example	`upper('Hello')`
Result	`HELLO`
Alias	`ucase`

`url_decode(string)`

Description	Decodes a URL from a representation using Percent-Encoding.
Example	`url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')`
Result	`https://duckdb.org/why_duckdb#portable`

`url_encode(string)`

Description	Encodes a URL to a representation using Percent-Encoding.
Example	`url_encode('this string has/ special+ characters>')`
Result	`this%20string%20has%2F%20special%2B%20characters%3E`

Text Similarity Functions

These functions are used to measure the similarity of two strings using various similarity measures.

Name	Description
`damerau_levenshtein(s1, s2)`	Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different.
`editdist3(s1, s2)`	The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.
`hamming(s1, s2)`	The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.
`jaccard(s1, s2)`	The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1.
`jaro_similarity(s1, s2[, score_cutoff])`	The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
`jaro_winkler_similarity(s1, s2[, score_cutoff])`	The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
`levenshtein(s1, s2)`	The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.
`mismatches(s1, s2)`	The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.

`damerau_levenshtein(s1, s2)`

Description	Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different.
Example	`damerau_levenshtein('duckdb', 'udckbd')`
Result	`2`

`editdist3(s1, s2)`

Description	The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.
Example	`editdist3('duck', 'db')`
Result	`3`
Alias	`levenshtein`

`hamming(s1, s2)`

Description	The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.
Example	`hamming('duck', 'luck')`
Result	`1`
Alias	`mismatches`

`jaccard(s1, s2)`

Description	The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1.
Example	`jaccard('duck', 'luck')`
Result	`0.6`

`jaro_similarity(s1, s2[, score_cutoff])`

Description	The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
Example	`jaro_similarity('duck', 'duckdb')`
Result	`0.8888888888888888`

`jaro_winkler_similarity(s1, s2[, score_cutoff])`

Description	The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
Example	`jaro_winkler_similarity('duck', 'duckdb')`
Result	`0.9333333333333333`

`levenshtein(s1, s2)`

Description	The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.
Example	`levenshtein('duck', 'db')`
Result	`3`
Alias	`editdist3`

`mismatches(s1, s2)`

Description	The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.
Example	`mismatches('duck', 'luck')`
Result	`1`
Alias	`hamming`

Formatters

`fmt` Syntax

The format(format, parameters...) function formats strings, loosely following the syntax of the {fmt} open-source formatting library.

Format without additional parameters:

SELECT format('Hello world'); -- Hello world

Format a string using {}:

SELECT format('The answer is {}', 42); -- The answer is 42

Format a string using positional arguments:

SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right.

Format Specifiers

Specifier	Description	Example
`{:d}`	integer	`654321`
`{:E}`	scientific notation	`3.141593E+00`
`{:f}`	float	`4.560000`
`{:o}`	octal	`2375761`
`{:s}`	string	`asd`
`{:x}`	hexadecimal	`9fbf1`
`{:tX}`	integer, `X` is the thousand separator	`654 321`

Formatting Types

Integers:

SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8

Booleans:

SELECT format('{} != {}', true, false); -- true != false

Format datetime values:

SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01
SELECT format('{}', TIME '12:01:00'); -- 12:01:00
SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00

Format BLOB:

SELECT format('{}', BLOB '\x00hello'); -- \x00hello

Pad integers with 0s:

SELECT format('{:04d}', 33); -- 0033

Create timestamps from integers:

SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM

Convert to hexadecimal:

SELECT format('{:x}', 123_456_789); -- 75bcd15

Convert to binary:

SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101

Print Numbers with Thousand Separators

Integers:

SELECT format('{:,}',  123_456_789); -- 123,456,789
SELECT format('{:t.}', 123_456_789); -- 123.456.789
SELECT format('{:''}', 123_456_789); -- 123'456'789
SELECT format('{:_}',  123_456_789); -- 123_456_789
SELECT format('{:t }', 123_456_789); -- 123 456 789
SELECT format('{:tX}', 123_456_789); -- 123X456X789

Float, double and decimal:

SELECT format('{:,f}',    123456.789); -- 123,456.78900
SELECT format('{:,.2f}',  123456.789); -- 123,456.79
SELECT format('{:t..2f}', 123456.789); -- 123.456,79

`printf` Syntax

The printf(format, parameters...) function formats strings using the printf syntax.

Format without additional parameters:

SELECT printf('Hello world');

Hello world

Format a string using arguments in a given order:

SELECT printf('The answer to %s is %d', 'life', 42);

The answer to life is 42

Format a string using positional arguments %position$formatter, e.g., the second parameter as a string is encoded as %2$s:

SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');

I'd rather be happy than right.

Format Specifiers

Specifier	Description	Example
`%c`	character code to character	`a`
`%d`	integer	`654321`
`%Xd`	integer with thousand seperarator `X` from `,`, `.`, `''`, `_`	`654_321`
`%E`	scientific notation	`3.141593E+00`
`%f`	float	`4.560000`
`%hd`	integer	`654321`
`%hhd`	integer	`654321`
`%lld`	integer	`654321`
`%o`	octal	`2375761`
`%s`	string	`asd`
`%x`	hexadecimal	`9fbf1`

Formatting Types

Integers:

SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8

Booleans:

SELECT printf('%s != %s', true, false); -- true != false

Format datetime values:

SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01
SELECT printf('%s', TIME '12:01:00'); -- 12:01:00
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00

Format BLOB:

SELECT printf('%s', BLOB '\x00hello'); -- \x00hello

Pad integers with 0s:

SELECT printf('%04d', 33); -- 0033

Create timestamps from integers:

SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM

Convert to hexadecimal:

SELECT printf('%x', 123_456_789); -- 75bcd15

Convert to binary:

SELECT printf('%b', 123_456_789); -- 111010110111100110100010101

Thousand Separators

Integers:

SELECT printf('%,d',  123_456_789); -- 123,456,789
SELECT printf('%.d',  123_456_789); -- 123.456.789
SELECT printf('%''d', 123_456_789); -- 123'456'789
SELECT printf('%_d',  123_456_789); -- 123_456_789

Float, double and decimal:

SELECT printf('%,f',   123456.789); -- 123,456.789000
SELECT printf('%,.2f', 123456.789); -- 123,456.79

About this page

In this article