DuckDB on LoongArch
TL;DR: In today's “What's on your desk?” episode, we test a Loongson CPU with the LoongArch architecture.
It’s not every day that a new CPU architecture arrives on your desk. I grew up on the Intel 486 back in the early 90s. I also still remember AMD releasing its 64-bit x86 extension in 2000. Then not a lot happened until Apple released the ARM-based M1 architecture in 2020. But today is the day again (for me), with the long-awaited arrival of the “MOREFINE M700S” in our office.

The M700S contains a Loongson CPU. Also called “LoongArch” or “Godson” processors, this CPU was developed in China based on the (somewhat esoteric) MIPS architecture. This is part of a plan to become technologically self-sufficient as part of the government-funded Made in China 2025 plan.
It is probably safe to assume that – given the ongoing trade shenanigans – the Loongson will become much more popular in China as time goes on. DuckDB already sees quite a lot of usage from China, so naturally we want to make sure that DuckDB runs well on the Loongson. Thankfully, one of our community members has already opened a pull request with two minimal changes to allow DuckDB to compile. We became curious.
We purchased the M700S on (where else?) AliExpress for around 500 EUR. Besides the Loongson 8-core 3A6000 CPU it contains 16 GB of main memory and a 256 GB solid-state disk.

Once plugged in and booted up, things feel pretty normal besides the loud fan that seems to be always on. On the screen, a variant of Debian called Loongnix boots up. The GUI seems to be KDE-based and comes with a custom browser “LBrowser” which is a fork of Chromium. Just because it was not obvious we document it here: the default root password is M700S. There is also a user account m700s with the same password.

Overall, the software seems a little dated, even after running apt upgrade: the Linux kernel seems to be version 4.19, which was released back in 2018, and which has been EOL for a year now. The GCC version is 8.3, which similarly came out in 2019.
With the aforementioned patch, we managed to compile DuckDB 1.4.3 on Loongnix. There was one small issue where the CMake file append_metadata.cmake was not compatible with the older CMake version (3.13.4) available on Loongnix. But simply replacing that file with an empty one allowed us to complete the build. Of course we could also have updated CMake, but life is short. Once completed, we ran DuckDB’s extensive unit test suite (make allunit) to confirm that our build runs correctly on the Loongson CPU. Results looked good.
For performance comparison, we re-used the methodology from our previous blog post that ran DuckDB on a Raspberry Pi. In short, we run the 22 TPC-H benchmark queries on “Scale Factor” 100 and 300, which in DuckDB format is a 25 GB and 78 GB database file, respectively. We compare those numbers with the nearest computer, which is my day-to-day MacBook Pro with an M3 Max CPU. For fairness, we limit DuckDB to 14 GB of RAM on both platforms. The reported timings are “hot” runs, meaning we re-ran the query set and took the timings from the second run.
Here are the results, and they are not great. We start with aggregated timings:
| SF | System | Geometric mean | Sum |
|---|---|---|---|
| SF100 | MacBook | 0.6 | 16.9 |
| SF100 | MOREFINE | 6.1 | 192.8 |
| SF300 | MacBook | 2.8 | 78.8 |
| SF300 | MOREFINE | 27.3 | 791.6 |
We can see that the MacBook is around ten times faster than the MOREFINE on this benchmark, both in the geometric mean of runtimes as well as in the sum. If you are interested in the individual query runtimes, you can find them below.
Click here to see the individual query runtimes.
| Q | SF100/MacBook | SF100/MOREFINE | SF300/MacBook | SF300/MOREFINE |
|---|---|---|---|---|
| 1 | 1.247 | 7.363 | 4.528 | 26.475 |
| 2 | 0.117 | 1.058 | 0.474 | 4.101 |
| 3 | 0.697 | 8.563 | 2.759 | 32.432 |
| 4 | 0.570 | 7.348 | 2.331 | 27.185 |
| 5 | 0.631 | 8.498 | 3.217 | 34.462 |
| 6 | 0.180 | 1.236 | 1.395 | 13.225 |
| 7 | 0.620 | 7.702 | 3.119 | 37.411 |
| 8 | 0.640 | 5.593 | 3.611 | 29.914 |
| 9 | 1.906 | 30.560 | 6.670 | 99.884 |
| 10 | 0.923 | 11.755 | 4.036 | 40.412 |
| 11 | 0.102 | 1.037 | 0.709 | 4.444 |
| 12 | 0.535 | 6.422 | 2.918 | 31.501 |
| 13 | 1.847 | 21.185 | 6.394 | 74.081 |
| 14 | 0.408 | 5.616 | 3.240 | 26.613 |
| 15 | 0.252 | 2.652 | 1.906 | 17.454 |
| 16 | 0.273 | 3.108 | 0.879 | 11.480 |
| 17 | 0.805 | 5.184 | 4.655 | 28.469 |
| 18 | 1.538 | 15.492 | 7.619 | 71.845 |
| 19 | 0.779 | 9.143 | 4.379 | 39.111 |
| 20 | 0.441 | 4.993 | 3.234 | 25.967 |
| 21 | 1.996 | 23.231 | 9.503 | 96.452 |
| 22 | 0.441 | 5.036 | 1.237 | 18.709 |
It is always exciting to get DuckDB running on a new platform. Of course, we have built DuckDB to be ulta-portable and agnostic to hardware environments while still delivering excellent performance. So it was not that surprising that it was not that difficult to get DuckDB running on the MOREFINE with its new-ish CPU. However, performance on the standard TPC-H benchmark was not that impressive, with the MacBook being around ten times faster than the MOREFINE.
Of course, there are many opportunities for improvement. For starters, the gcc toolchain on LoongArch is likely not as advanced by far compared with its x86/ARM counterpart, so advances there could make a big difference. The same applies of course to IO performance, which we have not measured separately. But hey, the “glass half full” department could also rightfully claim that the Loongson CPU can complete TPC-H SF300!
One could also argue that a MacBook Pro is much more expensive than 500 EUR MOREFINE. However, a recent M4 Mac Mini with the same memory and storage specs will cost around 700 EUR, not that much more all things considered. It will run circles around the MOREFINE. And it will not constantly annoy you with its fan.