JupyterHub tip: Don't touch NFS unless you must

2022-11-07

z2jh jupyterhub nfs

206 words 1 min read

If you are running a JupyterHub on Kubernetes and use NFS for home directory storage (a common occurance), I highly recommend the following settings:

singleuser:
  extraEnv:
    # notebook server writes secure files that don't need to survive a
    # restart here. Writing 'secure' files on some file systems (like
    # Azure Files with SMB or NFS) seems buggy, so we just put runtime dir on
    # /tmp. This is ok in our case, since no two users are on the same
    # container.
    JUPYTER_RUNTIME_DIR: /tmp/.jupyter-runtime
  extraFiles:
    ipython_kernel_config.json:
      mountPath: /usr/local/etc/ipython/ipython_kernel_config.json
      data:
        # This keeps a history of all executed code under $HOME, which is almost always on
        # NFS. This file is kept as a sqlite file, and sqlite and NFS do not go together very
        # well! Disable this to save ourselves from debugging random NFS oddities that are caused
        # by this unholy sqlite + NFS mixture.
        HistoryManager:
          enabled: false

These aren’t specific to kubernetes, but to NFS (or AzureFile, or any other shared filesystem). sqlite and NFS do not mix, and you’ll run into weird errors you will not be able to really debug. Save yourself the pain :) Let your users know too, and tell them to not put sqlite databases on NFS.

Installing Python (and other packages) from conda-forge in GitHub Actions

2022-11-06

github code conda-forge

237 words 2 mins read

Alex Merose asked me on Mastodon how to best set up a conda environment on GitHub Actions. I thought I’d write a short blog post about it!

The short version is, use the conda-incubator/setup-miniconda GitHub action instead of the official setup-python action. Specifically, try out these options for size:

  steps:
    - uses: conda-incubator/setup-miniconda@v2
      with:
        # Specify python version your environment will have. Remember to quote this, or
        # YAML will think you want python 3.1 not 3.10
        python-version: "3.10"
        # This uses *miniforge*, rather than *minicond*. The primary difference is that
        # the defaults channel is not enabled at all
        miniforge-version: latest
        # These properties enable the use of mamba, which is much faster and far less error
        # prone than conda while being completely compatible with the conda CLI
        use-mamba: true
        mamba-version: "*"

This should give you a python environment named test with python 3.10 (or whatever version you specify) and nothing much else. You can then use mamba to install packages from conda-forge to your heart’s content in future steps like this:

    - name: Install some packages
      # The `-l` is needed so conda environment activation works correctly
      shell: bash -l {0}
      run: |
        mamba install -c conda-forge numpy scipy matplotlib

My recommendation is to continue using the default setup-python action for almost all your python needs in GitHub actions, and use conda-incubator/setup-miniconda@v2 only if you explicitly need to use mamba (or conda) for something.

Using Python (instead of bash) to write GitHub actions

2022-11-01

python code github

306 words 2 mins read

I am not smart enough to consistently write and debug shell scripts that use any conditional or looping constructs. So as soon as I’m writing something in bash that requires use of those, I switch to python. This works fine when writing scripts, but what to do when writing GitHub Actions workflows? You can only write bash in run: stanzas in your steps, right?

Not at all! You can set the shell parameter to anything you want, and the contents of run will be passed to the shell in the form of a file. This allows you to use not just Python, but basically any langauge when writing your GitHub actions workflows.

Here is an example step that used python.

  steps:
    - name: Something in python!
      # The -u means 'unbuffered', so print() statements in your python code are output correctly
      # otherwise, they might be out of order with stdout from commands your code calls
      # {0} is replaced with the name of the temporary file GitHub Actions creates with
      # the contents of the run:
      shell: python -u {0}
      run: |
        import sys
        import subprocess

        print("Hello, I am python")

        # We have to use string substitution for getting inputs, which is bad and ugly
        # However, it isn't as bad me trying to write conditionals in bash.
        # It might be possible to use environment values here, I haven't explored.
        variable = '${{ inputs.some_input }}'

        # Use subprocess.check_call to call out to external process. stdout is
        # handled correctly
        subprocess.check_call([
          sys.executable,
          '-m',
          'pip', 'install', 'django'
        ])

If you instead use mamba or conda to set up your Python, perhaps with the setup-miniconda action, you need to set shell: bash -l -c "python -u {0}". The bash -l makes sure that the appropriate conda environment is loaded, and then passes off to the ‘correct’ python.

Motorcycle Ride Report 2022 06 19

2022-06-21

motorcycle

956 words 5 mins read

I decided to write publicly and publish my motorcycle riding trip notes now.

Montarey Trip notes

Need to learn to plan when to pee. This is important to do! Can’t just get off any exit, as that often doesn’t actually lead to wilderness but to more concrete jungles.

Need to learn more about lane splitting and how and when to do it safely. I did it for a little bit while stuck on CA-1S, but was avoiding it for most of the ride. Feels like it’s only safe to do when traffic is almost at a standstill, and I can go at least 5mph faster than traffic. Also, I don’t wanna get hit by other motorcyclists lanesplitting. More research needed.

I need to adjust my levers to see if it helps with my skin near thumb getting sore.

Get a different seat maybe for more comfort? My butt does hurt and is sore.

I definitely need to feel much more confident in curves. Taking the TC class is a good next step.

DO NOT BELIEVE GOOGLE MAPS TIME ESTIMATES!! I am much slower than that haha.

I need to work on my core strenght, back to the Gym. I think without that, long rides are going to be painful.

Need to plan meals better too. I’m eating ’lunch’ at like 4:30, after having only eaten 3/4 cup of oats and 1 butter croissant all day. That isn’t enough. Partially this is because I couldn’t leave on time either - as usual. Still meant I left at 12 rather than 2 - but could’ve left at 10!

Post trip notes

This is at night. The ride back was all interstate, and quite different!

Food makes a huge difference. I think I need to eat more than I think too. Just everything was better after food. I should just always eat, and eat on time.
I am getting more confident about turns at speed. I think by default I end up slowing down for each turn from fear. That is pretty dangerous! I need to have a better understanding of how much traction is available, and just turn. Which way to lean is also important - but I think my current intuition (counter lean or neutral) is good enough for me. I saw some guy shred the twisties when I was up in the berkeley hills, and he had to lean quite a bit. Not what I’m doing though.

I should take more classes and read the total control book properly. Towards the end I was feeling pretty confident, and that was quite nice!
The way back was north, with the sun directly into my eyes for almost an hour. Was intense, I could barely see! I had sunvisor down and transition glasses and that wasn’t enough at all. I was in a trance. Looking back, how did I trust myself enough to go through that?! Would I have been able to see hazards on the road? Stopped cars? I was able to control speed pretty well. I was a bit tired too. But I did Montarey to San Jose in one go, and stopped - my body basically forced me to stop and rest. I don’t entirely know how I feel about this. I think it was mostly safe, as it was just on the interstate and I was on the left-ish lane. I also think I could actually mostly see - what I see in my memory now, hours after the fact, doesn’t seem quite right - feels filtered. It really was just an intense trance. I don’t mind interstate riding.
I should strap my bag down and not carry it! I think that will definitely help my back hurt a lot less. I had 2 bottles of water, laptop and ipad - I think that’s a lot. I should consider a smaller bag too.
A more comfortable seat would also help. My butt was pretty sore.
Physical strength is an important component of safety, and I must get back to the gym. Particularly, while turning I could feel my core was not strong enough several times, and I would like more trust in my muscle strenghts.
I lane-split a little! Just for parts of CA-17S, when in otherwise almost stopped traffic. I need to learn way more about it.
I should understand trail braking better. I think my understanding of it right now is dangerously flawed, as it contributes to me slowing down excessively in turns. Could get rear ended.
Really need a windshield. I can hardly even look at my mirrors in the interstate, because I can’t bring my torso up at all.
I drifted into next lane almost in one turn I think because I wasn’t braking and I was tired. I was trying to ‘push through’ and at some point even didn’t want to break in San Jose. WTF was that, compromising safety for challenge! No way.
I need to leave earlier so I can have more time at my target spaces. In fact, I should look for nice target spaces first and then travel - flip my currnet model a little. I don’t think I need to just haul miles anymore. Past that stage now.
The bananas in my backpack were smushed to death and the inside of my backpack is a banana smoothie. Would not recommend

TODOs

Concrete action items!

Always eat before a ride
Get a better helmet. I’ve spent months looking for the right one, time to look again :(
Start going through Champ U that I already bought
Look for another intermediate mototrcycling course I can take.
Plan time at conclusion of journey better.
Get a windshield for the bike.
Experiment with getting luggage off my back

Shortcut to Show / Hide Terminal in Visual Studio Code

2022-01-06

code vscode customization

144 words 1 min read

I wanted a single keyboard shortcut that would let me switch between the built-in terminal and the editor in vscode. I couldn’t find any, so I made a short macro using the VSCode Macros Extension

First, you define your macros in settings.json. You can open up vscode settings with Cmd+,, search for macros, and edit this under Macros: List.


    "macros.list": {
        "showTerminal": [
            "workbench.action.terminal.toggleTerminal",
            "workbench.action.terminal.focus",
            "workbench.action.toggleMaximizedPanel"
        ],
        "hideTerminal": [
            "workbench.action.toggleMaximizedPanel",
            "workbench.action.terminal.toggleTerminal"
        ]
    }

Then you can bind a keyboard shortcut to it. Open Preferences: Open Keyboard Shortcuts (JSON) from the command palette, and add this:

    {
        "key": "cmd+t",
        "command": "macros.showTerminal",
        "when": "!terminalFocus"
    },
    {
        "key": "cmd+t",
        "command": "macros.hideTerminal",
        "when": "terminalFocus"
    },

showTerminal gets called when you are focused anywhere in the editor except your terminal, and hideTerminal gets called when you are focused on your terminal. This gives me the ’toggle’ behavior I want.

Disabling JupyterLab extensions on your z2jh JupyterHub installations

2021-12-15 318 words 2 mins read

Sometimes you want to temporarily disable a JupyterLab extension on a JupyterHub by default, without having to rebuild your docker image. This can be very easily done with z2jh’s singleuser.extraFiles, and JupyterLab’s page_config.json

JupyterLab’s page_config.json lets you set page configuration by dropping JSON files under a labconfig directory inside any of the directories listed when you run jupyter --paths. We just use singleuser.extraFiles to provide this file!

singleuser:
  extraFiles:
    lab-config:
      mountPath: /etc/jupyter/labconfig/page_config.json
      data:
        disabledExtensions:
          jupyterlab-link-share: true

This will disable the link-share labextension, both in JupyterLab and RetroLab. You can find the name of the extension, as well as its current status, with jupyter labextension list.

jovyan@jupyter-yuvipanda:~$ jupyter labextension list
JupyterLab v3.2.4
/opt/conda/share/jupyter/labextensions
        jupyterlab-plotly v5.4.0 enabled OK
        jupyter-matplotlib v0.9.0 enabled OK
        jupyterlab-link-share v0.2.4 disabled OK (python, jupyterlab-link-share)
        @jupyter-widgets/jupyterlab-manager v3.0.1 enabled OK (python, jupyterlab_widgets)
        @jupyter-server/resource-usage v0.6.0 enabled OK (python, jupyter-resource-usage)
        @retrolab/lab-extension v0.3.13 enabled OK

This is extremely helpful if the same image is being shared across hubs, and you want some of the hubs to have some of the extensions.

singleuser.extraFiles can be used like this for any jupyter config, or generally any config file anywhere. For example, here’s some config that culls idle running kernels, and shuts down notebooks after 60m of inactivity:

singleuser:
  extraFiles:
    culling-config:
      mountPath: /etc/jupyter/jupyter_notebook_config.json
      data:
        NotebookApp:
          # shutdown the server after no 30 mins of no activity
          shutdown_no_activity_timeout: 1800

        # if a user leaves a notebook with a running kernel,
        # the effective idle timeout will typically be CULL_TIMEOUT + CULL_KERNEL_TIMEOUT
        # as culling the kernel will register activity,
        # resetting the no_activity timer for the server as a whole
        MappingKernelManager:
          # shutdown kernels after 30 mins of no activity
          cull_idle_timeout: 1800
          # check for idle kernels this often
          cull_interval: 60
          # a kernel with open connections but no activity still counts as idle
          # this is what allows us to shutdown servers
          # when people leave a notebook open and wander off
          cull_connected: true

Personal stance on 'crypto' and 'web3'

2021-11-18 636 words 3 mins read

This post is inspired by a slack conversation with Ryan Abernathey and Sarah Gibson.

It is very important to me that we don’t exist in a world where the Internet is just centralized and controlled by a few huge, hypercapitalist players. I’ve written about ways the open source community can help here (by not tying ourselves to a single provider), and helped draft the right to replicate policy to put it into effect at 2i2c. This topic is important to me - I don’t want a future where step 1 of learning to code is something extremely vendor specific.

Theoretically, this is very well within the scope of the ‘web3’ movement (quite different from the web 3.0 movement which has mostly died out). I would love for that to be the case, but unfortunately I’ve found almost nothing about the ‘crypto’ or ‘web3’ communities attractive. I’m going to try write down why in this post.

Most of this isn’t unique thought - others have written about these with much more citations, and deeper analysis than I have. I encourate you to seek those out and read them. This is also not an exhaustive list, but a bare minimum.

Environmental impact of proof of work

Difficult for me to get past the environmental impact of most proof of work systems. Any renewable energy they use is energy that could be used to take more fossil fuels off the grid. I don’t want proof of work systems to switch to renewable energy - I want them to stop existing.

The rich get richer

In the economic models of almost all ‘crypto’, growing inequality is a feature, not a bug. This is outright formalized in Proof of Stake systems, and is de-facto true in most Proof of Work systems. I don’t really want to live in a world that is even more unequal than the one I live in now. Fixes to this would be fundamental to any possible positive outcome from the ‘crypto’ community.

“Distributed Ledger” is not a solution to many problems

The buzzwordiness of ‘blockchain’, ‘crypto’ and ‘web3’ mean they’re now being forcefit into problems they are very poor solutions for. Every time I see the word ‘blockchain’, I mentally replace it with ‘distributed ledger’, and see if it can still be a good fit for the given problem. Most often, it does not. Problems often have to be reframed in terms of ’transactions’ to have a blockchain solve them, and I think this reframing is often fundamentally problematic - life isn’t transactional. When all you have is a distributed ledger, everything looks like a problem of incentives.

So any time something purports to solve a problem that is not use as a currency using a blockchain, consider carefully the problem it is solving, and seeing if the solutions it is proposing actually solve the problem in a way that is generally usable given latency and throughput consideration.

Let’s evaluate the tech on a case-by-case basis

The fediverse is quite cool. I think of IPFS as the spiritual successor to bittorrent in many ways (yes, I know they consider themselves to be web3). There’s a lot of really cool conceptual work happening in removing ourselves from being beholden to a couple of major corporations that doesn’t necessarily require buying into a speculative, environmentally disastrous property bubble. I want to evaluate each tech that might be solving a problem I find important, but if it has brought in strongly to the ‘crypto’ or ‘web3’ hype, it just has a much bigger bar to clear to prove that it just isn’t a speculative ponzi scheme.

I can’t wait for the huge amounts of money, wasteful proof of work and the hype train to die out, so we can pick up the useful tech that comes out of this.

The fastest way to share your notebooks - announcing NotebookSharing.space

2021-11-16 1042 words 5 mins read

Sharing notebooks is harder than it should be.

You are working on a notebook (in Jupyter, RStudio, Visual Studio Code, whatever), and want to share it quickly with someone. Maybe you want some feedback, or you’re demonstrating a technique, or there is a cool result you want to quickly show someone. A million reasons to want to quickly share a notebook, but unfortunately there isn’t a quick enough and easy enough solution right now. That’s why I built notebooksharing.space, focused specifically on solving the problem of - “I have a notebook, I want to quickly share it with someone else”.

Ryan Abernathey captures the current frustration, and lays out a possible glorious future for what a ‘share a notebook I am working on’ workflow might look like. NotebookSharing.space is a start in tackling part of this. I highly recommend reading this thread.

Someone on Twitter asked me to share a notebook. I'm frustrated that in late 2021 we still don't have a simple, open-science-approved way to do this. A 🧵 about some of the options (cc @choldgraf, @yuvipanda) ... /1 https://t.co/mnd3K68JgV
— Ryan Abernathey (@rabernat) October 8, 2021

As the goal is to have the fastest way to upload your notebook and share it with someone, there is no signup or user accounts necessary. Just upload your notebook, get the link, and share it however you want. Notebook links are permalinks - once uploaded, a notebook can not be changed. You can only upload a new notebook and get a new link.

The upload is a bit slow in the video demos here because I’m sitting on a hammock in a remote beach, but should be much faster for you.

You can upload your notebook easily via the web interface at notebooksharing.space:

Once uploaded, the web interface will just redirect you to the beautifully rendered notebook, and you can copy the link to the page and share it!

Or you can directly use the nbss-upload commandline tool:

On Macs, you can pipe the output of nbss-upload to pbcopy to automatically put the link in your clipboard. Here is the example notebook that I uploaded, so you can check out how it looks.

A jupyterlab extension to streamline this process is currently being worked on, and I’d appreciate any help. I’d also love to have extensions for classic Jupyter Notebook, RStudio (via an addin), Visual Studio Code, and other platforms.

Provide feedback with collaborative annotations

When uploading, you can opt-in to have collaborative annotations enabled on your notebook via the open source, web standards based hypothes.is service. You can thus directly annotate the notebook, instead of having to email back and forth about ’that cell where you are importing matplotlib’ or ’that graph with the blue border’. This is one of the coolest features of notebooksharing.space.

You can annotate the notebook demoed in the video here. Note that you need a free hypothes.is account to annotate, but not to read.

Annotations are opt-in to limit unintended abuse. Enabling an unrestricted comments section on every notebook you upload is probably a terrible idea.

Private by default

By default, search engines do not index your notebooks - you have to opt-in to making them discoverable while you are uploading the notebook. This way, only those you share the link to the notebook with can view it. But be careful about putting notebooks with secret keys or sensitive data up here, as anyone with a link can still view it - just won’t be able to discover it with search engines.

RMarkdown and other formats are supported

It has always been important to me that the R community is treated as a first class citizen in all the tools I build. Naturally, NotebookSharing.space supports class support for R Notebooks as well as R Markdown files experimentally. R Notebooks produced by RStudio are HTML files, and will be rendered appropriately, including outputs (see example). RMarkdown (.rmd) files only contain code and markdown, and will be rendered appropriately (see example) ). If you enable annotations, they will work here too! I would love for more feedback from the R community on how to make this work better for you. In particular, an RStudio Addin that lets you share with a single shortcut key from RStudio would be an amazing project to build.

If you work primarily in the R community, I’d love to work with you to improve support here. I know there are bugs and rough edges, and would love for them to get better.

Internally, the wonderful jupytext project is used to read various notebook formats, so anything it supports is supported on NotebookSharing.space. This means .py files and .md files produced by Jupytext or Visual Studio code will also be rendered correctly, albeit without outputs as those are not stored in these files.

Next steps?

I want extensions that let you publish straight from wherever you are working - JupyterLab, classic Jupyter Notebook, RStudio and VS Code. That should speed up how fast you can share considerably. Any help here is most welcome!

I also want users to interactively run the notebooks they find here easily. This involves some integration with mybinder.org most likely, where you (or the notebook author) can specify which environment to launch the notebook into. Wouldn’t it be wonderful to have a ‘Make interactive’ button or similar that can immediately put the notebook back into an interactive mode?

Tweets from here on in Ryan’s twitter thread sell this vision well.

But neither of these options has a link to execution.

However, I made the notebook on a cloud-based @ProjectJupyter Hub run by @2i2c_org, so I know the exact environment it was run in. The environment was built automatically by https://t.co/vf7xPeQwqv 4/
— Ryan Abernathey (@rabernat) October 8, 2021

Feedback

I love feedback! That’s why I spend my time building these. Open an issue or tweet at me. As with all my projects, this is community built and run - so please come join me :)

Inspiration

Growing up on IRC, pastebin services are part of life. In 2018, GitHub stopped supporting anonymous gists

so sharing a notebook with someone became a lot more work. NotebookSharing.space hopefully plugs that gap. The excellent rpubs.com is also an inspiration!

Announcing the nbgitpuller Link Generator browser extensions

2021-11-16 504 words 3 mins read

(Leave comments or discuss this post on the jupyter discourse)

nbgitpuller is my most favorite way to distribute content (notebooks, data files, etc) to students on a JupyterHub. The student mental model is ‘I click a link, and can start working on my notebook’, which is as close to ideal as we have today. That is possible since all the information required for this workflow is embedded in the link itself - so it can be distributed easily via your pre-existing communication channel (like email, course website, etc), rather than requiring your students to use yet another tool.

However, creating these links often been a bit clunky and error prone. The current link generator is pretty awesome, but requires a lot of manual copy pasting, and is prone to errors. Particularly problematic was the GitHub switching of the default branch from master to main, which really caused problems for many instructors.

To make life easier, I’ve now written a browser extension that lets you create these links straight from the GitHub interface!

On the GitHub page for files, folders and repositories, it adds an ’nbgitpuller’ button.

On clicking this, you can enter a JupyterHub URL and the application you want to use to open this file, folder or repository. Then you can just copy the nbgitpuller URL, and share it with your students!

nbgitpuller popover

The JupyterHub URL and application you choose are remembered, so you do not need to enter it over and over again.

How to install?

On Mozilla Firefox

I ❤️ Mozilla Firefox, and you can install the extension there easily from the official addons store. You’ll also get automatic updates with new features and bug fixes this way.

On Google Chrome

On Google Chrome / Chromium, I have submitted the extension to the Chrome Web Store - but apparently there is a manual review process, and it can take weeks. In the meantime, you can install it with the following steps

Download the .zip version of the latest release of the extension. You want the file named nbgitpuller_link_generator-<version>.zip.
Extract the .zip file you downloaded.
In your Google Chrome / Chromium, go to chrome://extensions.
Enable the Developer Mode toggle in the top right. This should make a few options visible in a new toolbar.
Select Load Unpacked, and select the directory into which the downloaded .zip file was extracted to. This directory should contain at least a manifest.json file that was part of the .zip file.

Other browsers

The extension is written as a WebExtension, so should work easily in other browsers - Safari, Edge, etc. However, since I do not use them myself, I don’t have instructions on how to add them there. However, if you do use those browsers, I’d love contributions on how to install the extension there!

Contribute!

The extension was a overnight hack, and it made me very happy to get it shipped. There is a lot of room for improvement - I would love for you to provide feedback and contribute in any way possible on GitHub

Learning R

2021-05-14

code

267 words 2 mins read

I’m paying money to learn datascience with R via this John Hopkins course on Coursera. Hopefully I can spend about 4h a week on it.

Why?

I try to have a ‘T’ shaped set of competencies - some competence in a lot of areas, and deeper knowledge in some. Having some competence in many areas lets you notice where a particular skill can be very helpful in solving a problem, and either apply it yourself or bring in someone who can. Interesting problems are found in intersections, and learning this will give me access to more intersections :)

It is also an extremely useful skillset - I am sure I can apply it to more problems than I can apply (for example) cloud computing knowledge.

I’ve been building tools for data scientists for a while now, but since I don’t actually know much about data science itself my effectiveness is limited.

Why this course?

I went to coursera.org, typed ‘data science’ and this was the first that showed up :D

Why R?

I already know Python as a programming language, which sometimes makes it difficult for me to learn data science via it. Many courses targetted at people with my level of skill in data science / stats also teach some python alongside, and I often found that distracting.

With my JupyterHub contributor hat on, I think it’s extremely important that R is a first class citizen in all the teaching & research tools I build. Getting some experience using it will help in this goal.

Why this blog post?

Just as a form of external accountability.