The Littlest Jupyterhub
Contents
This idea comes from brainstorming along with Lindsey Heagy, Carol Willing, Tim Head & Nick Bollweg at the Jupyter Team Meeting 2018. Most of the good ideas are theirs! The name is inspired by one of favorite TV series of one of my favorite people.
I really love the idea of JupyterHub distributions - opinionated combination of components that target a specific use case. The Zero to JupyterHub distribution is awesome & works for most people. However, it requires Kubernetes - a distributed system with inherent complexities that is not worth it below a certain threshold.
This blog post lays out ideas for implementing a simpler, smaller distribution called The Littlest JupyterHub. The Littlest JupyterHub serves the long tail of potential JupyterHub users who have the following needs only.
- Support a very small number of students (around 20–30, maybe 50)
- Run on only one node, either a cheap VPS or a VM on their favorite cloud provider
- Provide the same environment for all students
- Allow the instructor / admin to easily modify the environment for students with no specialized knowledge
- Be extremely low maintenance once set up & easily fixable when it breaks
- Allow easy upgrades
- Enforce memory / CPU limits for students
The target audience is primarily educators teaching small classes with Jupyter Notebooks. It should be an extremely focused distribution, with new feature requests facing higher scrutiny than usual. It has a legitimate chance of actually reaching 1.0 & being stable, requiring minimal ongoing upgrades!
JupyterHub setup
JupyterHub + ConfigurableHTTPProxy run as standard systemd services. Systemd spawner is used - it is lightweight, allows JupyterHub restarts without killing user servers & provides CPU / memory isolation.
Something like First Use Authenticator
- a user whitelist might be good enough for a large number of users. New authenticators are added whenever users ask for them.
The JupyterHub system is in its own root owned conda environment or virtualenv, to prevent accidental damage from users.
User environment
There is a single conda environment shared by all the users. JupyterHub admins have write access to this environment, and everyone else has read access. Admins can install new libraries for all users with conda/pip. No extra steps needed, and you can do this from inside JupyterHub without needing to ssh.
Each user gets their own home directory, and can install packages there if they wish. systemdspawner puts each user server in a systemd service, and provides fine grained control over memory & cpu usage. Users also get their own system user, providing an additional layer of security & standardized home directory locations.
Configuration
YAML is used for config - it is the
least bad of all the currently available languages, IMO. Ideally, something
like the visudo
command
would exist for editing & applying this config. It’ll
open the config file in an editor, allow users to edit it, and apply it only
if it is valid. Advanced users can sidestep this and edit files directly. The
YAML file is read and processed directly in jupyterhub_config.py
. This
simplifies things & gives us fewer things to break.
Upgrading the distribution
Backwards compatible upgrading will be supported across one minor version only - so you can go from 0.7 to 0.8, but not 0.9. Upgrades should not cause outages.
Installation mechanism
Users run a command on a fresh server to install this distribution. This could use conda constructor (thanks to Nick Bollweig & Matt Rocklin for convincing me!) or debian packages (with fpm or dh-virtualenv). The user environments will be conda environments.
A curl <some-url> | sudo bash
command is available in a
nice looking website for the distribution that users can copy paste into
their fresh VM. This website also has instructions for creating a fresh VM in
popular cloud providers & VPS providers.
Debuggability
All systems exist in a partially degraded state all the time. Good systems self-heal & continue to run as well as they can. When they can’t, they break cleanly in known ways. They are observable enough to debug the issues that cause 80% of the problems.
The Littlest JupyterHub should be a good system. Systemd captures logs from JupyterHub, user servers & the proxy. Strong validation of the config file catches fatal misconfigurations. Reboots actually fix most issues and never make anything worse. Screwed up user environments are recoverable.
We’ll discover how this breaks as users of varying skill levels use it, and update our tooling accordingly.
But, No Docker?
Docker has been explicitly excluded from this tech stack. Building custom docker images & dealing with registries is too complex most educators. A good distribution embraces its constraints & does well!
Contribute
Are you a person who would use a distribution like this? We would love to hear from you! Make an issue on GitHub, tweet at me, or send me an email.
Author Yuvi
LastMod 2018-06-18