DuckDB in Python in the Browser with Pyodide, PyScript, and JupyterLite

Author Avatar
Alex Monahan
Published on 2024-10-02

TL;DR: Run DuckDB in an in-browser Python environment to enable simple querying on remote files, interactive documentation, and easy to use training materials.

Time to “Hello World”

The first time that you are using a new library, the most important thing is how quickly you can get to “Hello World”.

Note Want to see “Hello World”? Jump to the fully interactive examples!

Likewise, if someone is visiting any documentation you have written, you want them to quickly and easily get your tool up and running. When you are giving a demo, you want to avoid "demo hell" and have it work the first try!

If you want to try “expert mode”, try leading an entire conference room of people through those setup steps! The classroom or conference workshop environment makes it far more critical that installation be bulletproof.

Python is one of our favorite ways to use DuckDB, but Python is notoriously difficult to set up – doubly so for a novice programmer. What the heck is a virtual environment? Are you on Windows, Linux, or Mac? Pip or Conda? The new kid on the block uv?

Experienced Pythonistas are not immune either! Many, like me, have been forced to celebrate the time honored and xkcd-chronicled tradition of just wiping everything related to Python and starting from scratch.

How can we make it as easy and as fast as possible to test out DuckDB in Python?

Difficulties of Server-Side Python

One response to this challenge is to host a Python environment on a server for each of your users. This has a number of issues.

Hosting Python on a server yourself is not free. If you have many users, it can be far from free.

If you want to use a free solution like Google Colab, each visitor will need a Google account, and you'll need to be comfortable with Google accessing your data. Plus, it is hard to embed within an existing web page for a seamless experience.

Enter Pyodide

Pyodide is a way to run Python directly in your browser with no installation and no setup, thanks to the power of WebAssembly. That makes it the easiest and fastest way to get a Python environment up and running – just load a web page! All computation happens locally, so it can be served like any static website with tools like GitHub Pages. No server-side Python required!

Another benefit is that Pyodide is nicely sandboxed in the browser environment. Each user gets their own workspace, and since it is all local, it is nice and secure.

Part of what sets Pyodide apart from other in-browser-Python approaches is that it can even run libraries that are written in C, C++, or even Fortran, including much of the Python data science stack. This means that now you can use DuckDB in Pyodide as well! You can even combine it with NumPy, SciPy, and Pandas (in addition to many pure-Python libraries). PyArrow and Ibis have experimental support also.

Use Cases for Pyodide DuckDB

Want to quickly analyze some remote data using either Python or DuckDB?

Pyodide is the fastest way to get your questions answered using Python.

Want to quickly analyze some local data?

Pyodide can also query local files!

Want to make your documentation interactive?

Let your users test out your DuckDB-powered library with ease. We will see an example below that demonstrates the magic-duckdb Jupyter plugin to enable SQL cells.

Leading a training session with DuckDB and Python?

Skip the hassles of local installation. There is no need to work 1:1 with the 15% of folks in the audience with some quirky setup! Everyone will get this to work on the first try, in seconds, so you can get to the content you want to teach. Plus, it is free, with no signup required of any kind!

Pyodide Examples

We will cover multiple ways to embed Pyodide-powered-Python directly into your site, so your users can try out your new DuckDB-backed tool with a single click!

  • PyScript Editor
    • An editor with nice syntax highlighting
  • JupyterLite Notebook
    • A classic notebook environment
  • JupyterLite Lab IDE
    • A full development environment

PyScript Editor

This HTML snippet will embed a runnable PyScript editor into any page!

<script type="module" src="https://pyscript.net/releases/2024.8.2/core.js"></script>
<script type="py-editor" config='{"packages":["duckdb"]}'>
    import duckdb
    print(duckdb.sql("SELECT '42 in an editor' AS s").fetchall())
</script>

Just click the play button and you can execute a DuckDB query directly in the browser. You can edit the code, add new lines, etc. Try it out!

JupyterLite Notebook

Here is an example of using an iframe that points to a JupyterLite environment that was deployed to GitHub Pages!

<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=hello_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>

This is a fully interactive Python notebook environment, with DuckDB running inside. Feel free to give it a run!

Configuring a full JupyterLite environment is only a few steps! The JupyterLite folks have built a demo page that serves as a template and have some great documentation. The main steps are to:

  1. Use the JupyterLite Demo Template to create your own repo
  2. Enable GitHub Pages for that repo
  3. Add and commit a .ipynb file in the content folder
  4. Visit https://⟨YOUR_GITHUB_USERNAME⟩.github.io/⟨YOUR_REPOSITORY_NAME⟩/notebooks/index.html?path=⟨YOUR_NOTEBOOK.ipynb⟩

Note that it can take a couple of minutes for GitHub Pages to deploy. You can monitor the progress on GitHub's Actions tab.

JupyterLite Lab IDE

After following the steps in the JupterLite Notebook setup, if you change your URL from /notebooks/ to /lab/, you can have a full IDE experience instead! This form factor is a bit harder to embed in another page, but great for interactive use.

This example uses the magic-duckdb Jupyter extension that allows us to create SQL cells using %%dql.

Follow this link to see the Lab IDE interface, or experiment with the Notebook-style version below.

<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=magic_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>

Architecture of DuckDB in Pyodide

So how does DuckDB work in Pyodide exactly? The DuckDB Python client is compiled to WebAssembly (Wasm) in its entirety. This is different than the existing DuckDB Wasm approach, since that is compiling the C++ side of the library only and wrapping it with a JavaScript API. Both approaches use the Emscripten toolchain to do the Wasm compilation. It is DuckDB's design decision to avoid dependencies and the prior investments in DuckDB-Wasm that made this feasible to build in such a short period of time!

The Pyodide team has added DuckDB to their hosted repository of libraries, and even set up DuckDB to run as a part of their CI/CD workflow. That is what enables JupyterLite to simply run %pip install duckdb, and PyScript to specify DuckDB as a package in the py-editor config parameter or in the <py-config> tag. Pyodide then downloads the Wasm-compiled version of the DuckDB library from Pyodide's repository. We want to send a big thank you to the Pyodide team, including Hood Chatham and Gyeongjae Choi, as well as the Voltron Data team including Phillip Cloud for leading the effort to get this to work.

Limitations

Running in the browser is a more restrictive environment (for security purposes), so there are some limitations when using DuckDB in Pyodide. There is no free lunch!

  • Single-threaded
    • Pyodide currently limits execution to a single thread
  • A few extra steps to query remote files
    • Remote files can't be accessed by DuckDB directly
    • Instead, pull the files locally with Pyodide first
    • DuckDB-Wasm has custom enhancements to make this possible, but these are not present in DuckDB's Python client
  • No runtime-loaded extensions
    • Several extensions are automatically included: parquet, json, icu, tpcds, and tpch.
  • Release cadence aligned with Pyodide
    • At the time of writing, duckdb-pyodide is at 1.0.0 rather than 1.1.1

Conclusion

Pyodide is now the fastest way to use Python and DuckDB together! It is also an approach that scales to an arbitrary number of users because Pyodide's computations happen entirely locally.

We have seen how to embed Pyodide in a static site in multiple ways, as well as how to read remote files.

If you are excited about DuckDB in Pyodide, feel free to join us on Discord. We have a #show-and-tell channel where you can share what you build with the community. You are also welcome to explore the duckdb-pyodide repo and report any issues you find. We would also really love some help with enabling runtime-loaded extensions – please reach out if you can help!

Happy quacking about!

Recent Posts

Optimizers: The Low-Key MVP
deep dive

Optimizers: The Low-Key MVP

2024-11-14
Tom Ebergen
Analytics-Optimized Concurrent Transactions
deep dive

Analytics-Optimized Concurrent Transactions

2024-10-30
Mark Raasveldt and Hannes Mühleisen
All blog posts