Solving Letter Scramble Puzzles with DuckDB

Author Avatar
Gabor Szarnyas
2025-09-11 · 4 min

TL;DR: In this lighthearted post, we solve a puzzle type that's on display in Dutch trains.

The Dutch National Railways (NS) is releasing a “letter scramble”-style puzzle every week where they give a term whose letters can be found in a Dutch train station's name. In order to have a match, it doesn't have to be a perfect anagram – for example, Amsterdam (9 letters) matches both mastered (8 letters) and Dream Master (11 letters) because all three terms contain the same letters, just with different amounts of repetitions. Let's call this a “weak anagram”.

The puzzle in the first week of September was Clumsy Rental Red. Let's try to find the solution using DuckDB!

Letter Scrambling Macro

First, let's create a macro that turns a string into an ordered list of unique characters:

CREATE MACRO order_letters(s) AS 
    lower(s)                  -- convert all characters to lowercase
    .regexp_replace(          -- remove all non-Unicode letters
        '[^\p{L}]', '', 'g'
    )
    .string_to_array('')      -- turn the string into a list
    .list_distinct()          -- eliminate duplicate elements from the list
    .list_sort();             -- sort the list

We can use this to see whether two terms are weak anagrams:

SELECT
    order_letters('Amsterdam') AS letters_1,
    order_letters('mastered') AS letters_2,
    order_letters('Dream Master') AS letters_3,
    letters_1 = letters_2 AS matches_1,
    letters_1 = letters_3 AS matches_2;
letters_1 letters_2 letters_3 matches_1 matches_2
[a, d, e, m, r, s, t] [a, d, e, m, r, s, t] [a, d, e, m, r, s, t] true true

Indeed, both expressions are weak anagrams of Amsterdam!

Matching against Station Names

To solve the puzzle, we need a list of train stations. Luckily, one of our go-to datasets at DuckDB is the Dutch railway datasets, including its services and train stations. We can create a table with the station names:

CREATE TABLE stations AS
    FROM 'https://blobs.duckdb.org/nl-railway/stations-2023-09.csv';

Then, we can select the station names that are weak anagrams of the puzzle:

SELECT name_long
FROM stations
WHERE order_letters(name_long) = order_letters('Clumsy Rental Red');

We are not going to spoil the solution but you can reveal it below.

Click to see the solution.

Lelystad Centrum

Table Macro for Finding Weak Anagrams

To find station name, which is a weak anagram to a term, we can use a table macro:

CREATE MACRO find_weak_anagram(s) AS TABLE
    SELECT name_long
    FROM stations
    WHERE order_letters(name_long) = order_letters(s);

Then, we can find the solution using a simple SQL statement:

FROM find_weak_anagram('Clumsy Rental Red');

Weak Anagram Station Pairs

We got curious: are there two stations whose names are weak anagrams of each other? We can create a Cartesian product from the station names and compare their ordered letters to find out:

SELECT s1.name_long AS station_1, s2.name_long AS station_2
FROM stations s1, stations s2
WHERE s1.name_long.order_letters() = s2.name_long.order_letters()
  -- ensure symmetry-breaking
  AND s1.name_long < s2.name_long
  -- make sure the station names don't contain each other
  AND NOT s1.name_long.contains(s2.name_long)
  AND NOT s2.name_long.contains(s1.name_long);

There are in fact three station pairs whose names are weak anagrams of each other:

station_1 station_2
Melsele Selm
Etten-Leur Lunteren
Diemen Zuid Emmen Zuid

Cleaning Up

Most of the time, you don't have to clean up after running a simple DuckDB script: simply closing the in-memory database session takes care of the cleanup. However, it's worth pointing out that macros in DuckDB are persisted and this can get in the way – e.g., when copying the database into a DuckLake:

ATTACH 'ducklake:metadata.ducklake' AS my_ducklake;
COPY FROM DATABASE memory TO my_ducklake;

DuckLake does not support macros (functions), so it throws the following error:

Not implemented Error:
DuckLake does not support functions

There are two options to work around this problem.

  • If you need to keep the macros and you are using DuckDB as your catalog database for DuckLake, you can use the DuckDB to DuckLake migration script. This will migrate the macros into the catalog of your DuckLake.

  • If you do not need the macros or your destination's catalog database does not support them, you can drop them with the following commands:

    DROP MACRO order_letters;
    DROP MACRO TABLE find_weak_anagram;
    

    Without the macros, the copying to DuckLake call succeeds.

Summary

That was our quick guide to solving the NS puzzle. Is this a database problem? Not really, but DuckDB's SQL allows you to succinctly formulate it and solve it in less than 0.1 seconds! And sure, ChatGPT can solve this puzzle – but it spends almost a minute (using a ton of computing resources) to reason its way through it.

Happy puzzle solving!

This week, the puzzle is Zere Tanda Voozan. You can find the weekly puzzle's solution on the NS website.

Update on Weak vs. Strong Anagrams

Reader feedback indicated that the NS puzzles are strong anagrams (with the same amount of letters as the station name) and the puzzle term for the first week of September was actually Clumsy Rental Ted and not Clumsy Rental Red! The latter is impossible to fact-check but we looked into historical data, ran it through our DuckDB solver script, and it turned out that 95% of the time there are strong anagrams. However, occasionally weak anagrams are also given such as Alleen Costume Hut which stands for Houten Castellum even though the puzzle term and the station name have different numbers of e letters.