Solving Letter Scramble Puzzles with DuckDB
TL;DR: In this lighthearted post, we solve a puzzle type that's on display in Dutch trains.
The Dutch National Railways (NS) is releasing a “letter scramble”-style puzzle every week where
they give a term whose letters can be found in a Dutch train station's name.
In order to have a match, it doesn't have to be a perfect anagram – for example, Amsterdam
(9 letters) matches both mastered
(8 letters) and Dream Master
(11 letters) because all three terms contain the same letters, just with different amounts of repetitions. Let's call this a “weak anagram”.
The puzzle in the first week of September was Clumsy Rental Red
. Let's try to find the solution using DuckDB!
Letter Scrambling Macro
First, let's create a macro that turns a string into an ordered list of unique characters:
CREATE MACRO order_letters(s) AS
lower(s) -- convert all characters to lowercase
.regexp_replace(
'[^\p{L}]', '', 'g'
) -- remove all non-Unicode letters
.string_to_array('') -- turn the string into a list
.list_distinct() -- eliminate duplicate elements from the list
.list_sort(); -- sort the list
We can use this to see whether two terms are weak anagrams:
SELECT
order_letters('Amsterdam') AS letters_1,
order_letters('mastered') AS letters_2,
order_letters('Dream Master') AS letters_3,
letters_1 = letters_2 AS matches_1,
letters_1 = letters_3 AS matches_2;
letters_1 | letters_2 | letters_3 | matches_1 | matches_2 |
---|---|---|---|---|
[a, d, e, m, r, s, t] | [a, d, e, m, r, s, t] | [a, d, e, m, r, s, t] | true | true |
Indeed, both expressions are weak anagrams of Amsterdam
!
Station Names
To solve the puzzle, we need a list of train stations. Luckily, one of our go-to datasets at DuckDB is the Dutch railway datasets, including its services and train stations. We can create a table with the station names:
CREATE TABLE stations AS
FROM 'https://blobs.duckdb.org/nl-railway/stations-2023-09.csv';
Then, we can select the station names that are weak anagrams of the puzzle:
SELECT name_long
FROM stations
WHERE order_letters(name_long) = order_letters('Clumsy Rental Red');
Click to see the solution
Solve with a Table Macro
To find a weak anagram station name, we can use a table macro:
CREATE MACRO find_weak_anagram(s) AS TABLE
SELECT name_long
FROM stations
WHERE order_letters(name_long) = order_letters(s);
Then, we can find the solution with a short SQL statement:
FROM find_weak_anagram('Clumsy Rental Red');
Weak Anagrams Station Pairs
We got curious: are there two stations that are weak anagrams of each other? We can create a Cartesian product from the station names and compare their ordered letters to find out:
SELECT s1.name_long AS station_1, s2.name_long AS station_2
FROM stations s1, stations s2
WHERE s1.name_long.order_letters() = s2.name_long.order_letters()
-- ensure symmetry-breaking
AND s1.name_long < s2.name_long
-- make sure the station names don't contain each other
AND NOT s1.name_long.contains(s2.name_long)
AND NOT s2.name_long.contains(s1.name_long);
There are in fact three station pairs:
station_1 | station_2 |
---|---|
Melsele | Selm |
Etten-Leur | Lunteren |
Diemen Zuid | Emmen Zuid |
Cleaning Up
Most of the time, you don't have to clean up after running a simple DuckDB script: you simply close the in-memory database session, which takes care of the cleanup. However, it's worth pointing out that macros in DuckDB are persisted and this can get in the way – e.g., when copying the database into a DuckLake:
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake;
COPY FROM DATABASE memory TO my_ducklake;
DuckLake does not support macros (functions), and throws the following error message:
Not implemented Error:
DuckLake does not support functions
You can drop the macros with the following commands:
DROP MACRO order_letters;
DROP MACRO TABLE find_weak_anagram;
With this, the COPY FROM DATABASE
call succeeds.
Summary
That was our quick guide to solving the NS puzzle! Is this a database problem? Not really, but DuckDB's SQL allows you to succinctly formulate and solve it! Happy puzzle solving!
This week, the puzzle is
Zere Tanda Voozan
. You can find the weekly puzzle's solution on the NS website.