Access dask-kubernetes clusters running on a cloud from your local machine

You can run your Jupyter Notebook locally, and connect easily to a remote dask-kubernetes cluster on a cloud-based Kubernetes Cluster with the help of kubefwd. This notebook will show you an example of how to do so. While this example is a Jupyter Notebook, the code will work any local python medium - REPL, IDE (vscode), or just plain ol’ .py files

Latest executable version of this notebook can be found in this repository

Credit

This work was sponsored by Quansight ❤️

Create & setup a Kubernetes cluster

You need to have a working kubernetes cluster that is configured correctly. If you can get kubectl get ns to work properly, it means your cluster is working fine & connected for this to go.

Install & run kubefwd

kubefwd lets you access services in your cloud Kubernetes cluster as if they were localy, with a clever combination of ol’ school /etc/hosts hacks & fancy kubernetes port-forwarding. It requires root on Mac OS & Linux, and should theoretically work on Windows too (haven’t tested).

Once you have installed it, run it in a separate terminal.

sudo kubefwd svc -n default -n kube-system

If you’ve created your own namespace for your cluster, use that instead of default. The kube-system is required until this issue is fixed.

If the kubefwd command runs successfully, we’re good to go!

Install libraries we’ll need

In addition to dask-kubernetes, we’ll also need numpy to test our cluster with dask arrays.

%pip install numpy dask distributed dask-kubernetes

Setup dask-kubernetes configuration

Normally, the pod template would come from an external configuration file. We keep this in the notebook to make it more self contained.

POD_SPEC = {
    'kind': 'Pod',
    'metadata': {},
    'spec': {
        'restartPolicy': 'Never',
        'containers': [
            {
                'image': 'daskdev/dask:latest',
                'args': [
                    'dask-worker',
                    '--death-timeout', '60'
                ],
                'name': 'dask',
            }
        ]
    }
}

Create a remote cluster & connect to it

We create a KubeCluster object, with deploymode='remote'. This creates the scheduler as a pod on the cluster, so worker <-> scheduler communication is easy & efficient. kubefwd helps us communicate to this remote scheduler, so we can pretend we are actually on the remote cluster.

If you are using kubectl to watch the objects created in your namespace, you’ll see a service and a pod created for this. kubefwd should also list a log line about forwarding the service port locally.

from dask_kubernetes import KubeCluster

cluster = KubeCluster.from_dict(POD_SPEC, deploy_mode='remote')

Create some workers

We have a scheduler in the cloud, now time to create some workers in the cloud! We create 2, and can watch the worker pods come up with glee in kubectl

All scaling methods (adaptive scaling, using the widget, etc) should work here.

cluster.scale(2)

Run some computation!

We test our cluster by doing some trivial calculations with dask arrays. You can use any dask code as you normally would here, and it would run on the cloud Kubernetes cluster. This is especially helpful if you have large amounts of data in the cloud, since the workers would be really close to where the data is.

You might get warnings about version mismatches. This is ok for the demo, in production you’d probably build your own docker image that will have fixed versions.

from dask.distributed import Client
import dask.array as da

# Connect Dask to the cluster
client = Client(cluster)

# Create a large array and calculate the mean
array = da.ones((1000, 1000, 1000))
print(array.mean().compute())  # Should print 1.0

Cleanup dask cluster

When you’re done with your cluster, remember to clean it up to release the resources!

This doesn’t affect your kubernetes cluster itself - you’ll need to clean that up manually

cluster.close()

Next steps?

Lots more we can do from here.

Ephemeral Kubernetes Clusters

We can wrap kubernetes cluster creation in some nice python functions, letting users create kubernetes clusters just-in-time for running a dask-kubernetes cluster, and tearing it down when they’re done. Users can thus ‘bring their own compute’ - since the clusters will be in their cloud accounts - without having the complication of understanding how the cloud works. This is where this would be different from the wonderful dask-gateway project I think.

Remove kubefwd

kubefwd isn’t strictly necessary, and should ideally be replaced by a kubectl port-forward call that doesn’t require root. This should be possible with some changes to the dask-kubernetes code, so the client can connect to the scheduler via a different address (say, localhost:8974, since that’s what kubectl port-forward gives us) vs the workers (which need something like dask-cluster-scheduler-c12.namespace:8786, since that is in-cluster address).

Longer term, it would be great if we can get rid of spawning other processes altogether, if/when the python kubernetes client gains the ability to port-forward.

Integration with TLJH

I love The Littlest JupyterHub (TLJH). A common use case is that a group of users need a JupyterHub, mostly doing work that’s well seved by TLJH. However, sometimes they need to scale up to a big dask cluster to do some work, but not for long. In these cases, I believe a combination of TLJH + Ephemeral Kubernetes Clusters is far simpler & easier to manage than running a full Kubernetes based JupyterHub. In addition, we can share the conda environment from TLJH with the dask workers, removing the need for users to think about docker images or environment mismatches completely. This is a massive win, and merits further exploration.

???

I am not actually an end user of dask, so I’m sure actual dask users will have much more ideas. Or they won’t, and this will just end up being a clever hack that gives me some joy :D Who knows!

Check if an organization is using GSuite

I needed to find out if an organization was using GSuite for their emails, so I can allow their users to login to a JupyterHub I was setting up. I use Google Auth in other places, so I wanted to find out if this organization was using GSuite.

I could’ve just asked them, but where’s the fun in that? Instead, you can use dig to find out!

For example, if I were to test berkeley.edu, I can run:

$ dig -t mx berkeley.edu

This gives me a bunch of output, the important bits of which are:

;; ANSWER SECTION:
berkeley.edu.		242	IN	MX	5 alt1.aspmx.l.google.com.
berkeley.edu.		242	IN	MX	10 alt3.aspmx.l.google.com.
berkeley.edu.		242	IN	MX	10 alt4.aspmx.l.google.com.
berkeley.edu.		242	IN	MX	1 aspmx.l.google.com.
berkeley.edu.		242	IN	MX	5 alt2.aspmx.l.google.com.

This means that mail to anyone@berkeley.edu is routed via Google, so I know this organization is using GSuite for their users!

On Being an 'Entitled User'

Entitled users burn out maintainers. While my massive burning out wasn’t directly related, it didn’t help either. For a lot of other maintainers, it has been a primary source of burnout. This is an unfortunate reality of our time. There are a ton of stories from maintainers out there -see this for a recent example. Not going to rehash that.

However, I’ve recently found myself on the ‘other side’ of this coin. I exhibited the following symptoms

  • Whining about lack of documentation. I discover useful nuggets from helpful people on chat and by reading through the code, but don’t actually contribute those to docs.
  • Complain about technology choices, without a lot of actual experience in the domain.
  • When technology choices that I favor were made, not actually put my effort into it. I acknowledge that I don’t have to do this - I have little time too. But with whatever time I have, I wish I would do this than whine.
  • Hit-and-run engagement, where I only show up every few months whenever I am trying to solve a particular problem, complain / do things, then run away. Things seem to get more to my liking each time I come back, but I still seem to be complaining anyway. I also acknowledge that this is ok - I am burnt out too. But this does mean I can’t expect to find working with this code base easy. It also means that the team of people actually working on it only interact with me when I’m in this mode, which I don’t like either.
  • Wade into emotionally charged situations based on events that have already happened in the project that I wasn’t part of, trample around slightly, and then withdraw on recognizing I don’t have the emotional bandwidth to fully understand what was happening. This makes the situation worse for everyone, including myself.
  • There’s probably more behavior here that I don’t recognize.

In a lot of ways, I think my frustrations and criticisms are valid. However, there’s very constructive ways to engage, and then there’s just useless whining that burns other people out. I would like to think I’ve generally picked the constructive way in most projects. But that has blinded me in my behavior in a few projects.

I’m going to try and change my behavior to match what I’d like a frustrated user to do in open source projects where I’m the maintainer.

  • I’ll contribute documentation (either as blog posts, issues, or doc PRs) when I find solutions to problems that I am frustrated by
  • If I favor other technology choices, I’d provide constructive criticism of why, and then STFU. Or, I’ll actually work on making the change. I will not continue to whine.
  • When I recognize an emotionally charged situation, I will limit myself to actions I can fully emotionally engage in. This is a limitation of my own emotional bandwidth, fueled by burnout and other things. I’m simply recognizing the limitation, and trying to not make anything worse.
  • I do have a right to whine, and I’ll do so in private to people not otherwise engaged in the project.
  • I will generally ask myself ‘how would I like a user to behave if I was the maintainer?’ and try to act in that way

To be clear, these are the things I want to change about myself based on my situation. This isn’t a prescriptive list for other people to follow.

To the wonderful hardworking people who are maintainers of any popular open source projects - thank you for putting up with me.

Note taking: Why bother?

Talking to Ankur Sethi made me think more clearly about why I want note-taking, bookmarking setups.

I am trying to stop feeling so overwhelmed by moving things out of my puny brain.

The solution to this is two fold.

  1. Commit to and do fewer things. Already working on this.
  2. Be more efficient in the things you are doing.

A few years ago I gave up on chasing efficiency gains because they were all so marginal (hello, maintaining a super customized .vimrc). But I think I threw out the baby with the bathwater there, by focusing on the wrong kinds of efficiency. So am trying again!

Note Taking II - Retaining consumed content

In conversation with Nirbheek about my blog post on note taking, he said something like ‘my system of knowledge has holes, and I lose track of stuff I learnt because of that’. This was slightly complementary to my earlier quest - the outliner helped me in structuring how I thought and produced content. However, a bookmarking system will help me consume and retain knowledge. Both need to be integrated, but I currently had only one half of the puzzle.

We ended up talking about ‘bookmarks’ and how he has a lot of them. I’ve felt the need for saving stuff I read in useful ways, but never used bookmarks because I never look at them again. It would be awesome to have a system of that sort - especially if it was integrated into my notetaking and todo list solutions.

We talked about some current solutions - Pratul’s meticulously tagged pinboard account, Ankur’s blog’s links category. All this was too much work for me, but I couldn’t quite articulate why. I’ve tried both these options before and couldn’t keep them up. Longevity was my goal here, so I kept looking.

Memex

This lead us to Memex, and open source system (with a paid hosted version) that seemed to do everything we wanted. They made a particular point of not taking VC money, which is a big draw for me.

Nirbheek tried it out, and found it buggy / wanting. Particularly, it only worked on half his bookmarks - the rest had rotten, or the import itself was buggy. This could be fixed with Internet Archive integration, but there were a few more bugs about data loss in their issue tracker that didn’t give me too much confidence. It does still look amazing, and I will try it out at some point soon - unlike Nirbheek, I don’t have a few thousand bookmarks to import.

However, it gave met he name Memex, with a history behind it - an article by Vannevar Bush (Claude Shannon was his student) from 1945 called As we may think. This was an extremely fascinating read - I read it in the original layout, with fascinating ads by the side (Men’s garters, for example).

Here is my annotated version, with the origins of the rest of this blog post at the bottom. It’s fairly short, so I suggest you read the original. I’m not going to rehash it, but just talk about what I think I need in my ‘bookmarking’ solution.

What is a bookmark?

Bookmark is a terrible word for what I want, but I can’t think of a better one right now. Bush defines the ‘memex’ as:

A memex is a device in which an individual stores all his books, records and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

(Gendered pronouns in the source left in for for faithful reproduction. I’d like to think that if this was written today, it would use gender neutral pronouns)

So a ‘bookmark’ is an individual thing that I come across that I might possibly ever want to remember again. I can’t remember it all, so I want to toss it into a ‘bookmarking system’ that can be searched with ‘exceeding speed and flexibility’.

Based on this, I figured out my current criteria for looking at & trying out such systems to find one that meets my needs.

Search first

I want my interface to be a simple text box into which I can type things, and it’ll search the entire contents of everything I’ve ‘bookmarked’. This should be articles I’ve read, books, tweets, videos I’ve watched, podcasts I’ve listened to, my notes, etc. I don’t want to do any tagging or categorizing myself.

Tagging and categorizing remind me of dmoz and Yahoo Directories. Way too much work for me, and I’ll never look at them again.

Bush calls selection (we would call that ‘search’ today) as the primary problem , and rails against indexing (IMO we would call this tagging / categorizing) as insufficient.

This search needs to be extremely fast, and very relevant.

Never forgets

I should have access to these ‘bookmarks’ all the time. This means cross-device sync (Android, iPad, Mac OS for me).

It also needs to actually archive all the content I throw at it. Save web pages fully. Download (and transcribe) videos / podcats. Find some way to keep books in there. No link rot, no ‘shit this content is now behind a paywall’, no ‘damn, I wish I had my laptop with me’.

Hyperlinking

I should be able to connect bookmarks together. I guess this could be done with tagss, but I want something else that’s better. Bush talks about being able to link ‘bookmarks’ to each other with a blurb about the link - this is what we would call a ‘hyperlink’ now.

My note taker has hyperlinks with other notes, but I should be able to hyperlink to content I have saved. I should also be able to link together content I’ve saved that doesn’t have links themselves.

Strong integration between this and my note taker is probably the way to go. Each piece of content should have highlights, and notes attached to them. These could contain hyperlinks easily. The highlights and notes should also be searchable.

Publishable

Bush talks about ‘trailblazers’ - people who find new content, link it together in useful ways, and publish it for others.

This was how I used Google Reader’s Shared functionality. You could publish a link with all the content you have consumed that you marked as ‘shared’, with a note as well. I would subscribe to other people’s shared feeds, and find useful content (and blogs) to follow.

RIP Google Reader. It continues to make me wary of relying on any of Google’s services.

Having a note taker has already drastically changed how I think and produce content. I feel integrating this into my life will have similar effects on how I consume content. Now that I have a better idea of what I want, I’ll spend some more time looking.

Credits

Thanks to Nirbheek, Ankur Sethi and Prateek for conversations and thoughts on this topic :)

Note Taking Part I

I am trying to find a nice way to keep thoughts, ideas and notes outside my head for a few months now. I worked my way through many applications, trying to use the system they promote as the basis for my own. Along the journey, I got a better sense of what was important to me in such an application. I’ll walk through the various applications I’ve tried, along with the criteria I’ve developed by the end

Evernote

First stop, big one. I already had misgivings about it being from a large corporation. I still paid for it, and tested it out for a few months.

  • No markdown support. This immediately became a problem - I had no idea how much I’ve come to expect markdown wherever I type. They have been asked for markdown support many times, but I don’t think this is ever going to happen. I realized that markdown support is a minimum requirement for me
  • Conflict resolution when I edited the same note in multiple places was really terrible. Completely unusable. I’ve 3 devices I use regularly - an iPad Pro, a Macbook and my Android Phone. Through this experience I discovered that proper cross-device syncing is very important for me as well.
  • I paid Evernote so I could work offline. I already have a lot of anxiety about not having internet access (a topic for another post), and didn’t want to not have access to these if I don’t have internet access. This wasn’t very good thanks to the shitty conflict resolution, but the process made me realize that I cared a lot about full offline availability
  • I also hated the UI. It was clunky, had way too many options, and none of the options I really wanted.
  • There was no way to link notes together I found. I realized that my time at wikimedia had made me very accustomed to hyperlinking as a way of writing. Evernote had no clear affordances for doing this. So hyperlinking became important to me. I could also smile at the big impact Wikimedia has had on the way I think :D This was the beginning of me recognizing that a purely hierarchical setup might not work for me…

I noticed I was using Evernote less and less, and wandered the desert for a while.

OneNote

I had been using OneNote on and off for a while. It has some of the same problems Evernote has. But the absolute no-go for me was how it handled text notes on my laptop.

one note being terrible

This feature killed it. It was too much mental load, and I hated the extra UI it brought. It had most of the problems evernote did, and worse.

List applications

At this point, I read some parts of Getting Things Done again, trying to control my sprawling, anxiety-filled work life. It was useful, as always, but mostly to remind me to not use my brain to store data.

I then spent a bunch of time bouncing between different todo list apps, as a way to get stuff out of my brain and into a list. Learning about kinder todo lists was very helpful.

I tried a lot of these apps - Google Keep, Nirvana, Evernote / OneNote, TickTick. Nothing really clicked, but eventually I ended up on Todoist. The app was great, sync worked fine, and my flatmate also used it. Boom!

I was still looking for a note-taking app, which I considered quite distinct from the todo list apps. I was also finding the fixed format of todoist a bit limiting. I wanted each of my Projects to have a criteria of ‘this is now done’, but there wasn’t an easy way to do that in Todoist. The importance given to due dates was also nagging, and gave me anxiety.

Inkdrop

Looking for cross-device markdown note taking apps led me to Inkdrop. Built by a single indie dev, has lots of plugins, and looked quite nice. I started using it and quite liked it in the beginning. It has a vim plugin which is an absolute requirement for me if I’m going to be writing a lot of text (or so I thought) However, I found I wasn’t using it as much. It had all the features I thought I wanted, but somehow it wasn’t quite fitting the mental model I had.

Outliners

At this point, I heard about Roam Research. It looked really neat, and was quite different from the other note taking apps. I had always dismissed org-mode for its emacs-centrism, but this made me reconsider that. However, it was in closed beta, so I let it be. I’ll definitely re-visit it, especially if it has a good offline story.

I looked around for alternatives, and discovered workflowey. I’ll admit the list of endorsements on their home page was a big draw for me. However, when I tried to sign up, I got

one note being terrible

My usual password length is 63, and it totally failed that test. I also discovered they don’t support markdown so that rules them out

Dynalist

I discovered Dynalist as a Workflowey alternative. It has all the things I want, and I’ve been using it for a few days now. It’s been extremely awesome! The syncing is flakey, and I have some anxiety about losing data - but otherwise, the format itself has been great for me!

So what I’ve realized is that I don’t want a note taking app, I wanted an outliner. Outliners seem to match how my brain works much better than purely hierarchical note taking apps. I can take notes without having to worry about organizing them first, and then organize them afterwards as I see fit. Very low mental overhad between what is in my brain and what’s on the screen. I like it.

I hope I can replace todoist with Dynalist too. I’ll experiment with it later. But for now, <3 dynalist. This blog post was written in dynalist, then edited in typora. Not bad.

Evaluation criteria

In the end, looks like I have these needs when it comes to outliners / note taking apps.

  • Markdown support, ideally with something like KaTeX or MathJax support

  • Cross-device sync across platforms (Mac, iPad, Android)

  • Full offline support, with sync that doesn’t lose data

  • Ability to hyperlink different parts of the system

I’ll look at Roam research later, but for now, I’m enjoying having a note taking system that matches how my brain seems to think.

Credits

Lots of good conversations about this in various places with Steve Deobald, Pratul Kalia, Robla, Prateek, Ankur Sethi and many others who I am clearly forgetting.

Would love to hear about what you use, via Email or Twitter.

I miss blogging too

This person misses blogs a lot. So do I. I made many frients, built myself a platform, and expressed myself in ways I do not feel like I can anymore on my various blogs. Some of them are lost to time, but some are there if you know how to find it.

Part of it was where my life was - I had time, nobody physically nearby I jived with, and a lot of angst. But now, I have too little time, and too many (hah!) wonderful people I enjoy interacting with on a 1-1 basis, nearby or not. Plus, with the death of blogging platforms (Medium is a publishing platform, not a blogging one), content is harder to find. As the author points out,

The other day I searched for an hour and couldn’t find even one. They used to be endless. You’d just click on one you knew on Blogger and either click Blogger’s random blog button, or go to the sidebar of the blog you knew where they always had a list of blogs they liked, sometimes four or 5, sometimes 20 other blogs. And the same with Tumbler.

Technorati is dead, and there isn’t anything like it. Twitter, Facebook, Instagram, Medium, etc are no replacements for long form text insight into someone’s lives. They have brought the internet experience to way more people, but I feel like they have taken something away from me.

I guess this will be one of those blog posts that end with me saying “I will try to blog more” and then there are no more posts for the next few years :)

Devlog 2018 12 26

Physical activity

I walked for about 6 miles yesterday evening, after doing 3-4 miles each day for about 2-3 days before. The roads were empty, and it was lovely. I’ve listened to maybe 6-10 hours of Deborah Frances-White in the last week or so, split between The Guilty Feminist and Global Pillage. Next time though, I’m going to try listen to podcasts less & observe my surroundings more. I am doing this (plus PT) to rehab my knee mostly. I slept for 12h yesterday night :)

Asyncio deadlock

I don’t have enough experience with the pitfalls of concurrent programming where you have to use synchronization techniques. I had my first deadlock today, and after a few minutes realized it was actually an infinite recursion from a typo. I need to get a better theoretical understanding of both event loop based async programming and synchronization methods. I’ve used them with threading & Java, but feel shaky in asyncio.

simperviser now has 100% unit test coverage. But it combines asyncio, processes & signals - so that doesn’t give me enough confidence as it might have otherwise. I’ll take it though :)

VSCode customization

I’ve been re-reading the pragmatic programmer again (after initially reading it about 12 years or so ago). A lot of it still holds up, although some stuff is date (love for Broken Window policing & perl). It reminded me that I hadn’t really spent much time customizing and being more productive in VSCode, so am spending time today doing that.

I switch between vscode and my terminal quite a bit, mostly for git operations and running tests. I’m going to see if I can stay inside vscode comfortably for running pytest based tests.

It made me very sad that the only thing I wanted from the pytest integration in vscode is something the maintainers aren’t actively working on - shortcut to run test the cursor is currently at. I also can’t seem to see test output directly.

I think I’ll be using the Terminal for pytest runs for now. I’m not even going to try with git.

If I was younger and not already full of projects to do, I’d have picked up and tried to make a PR for this. Boo time commitments.

Readyness check

I rallied late in the day & wrote some code around readyness checks in simpervisor. I don’t fully understand what I want it to do, but I’m going to look at what kubernetes does & try to follow that. Since this is being written for nbserverproxy, I am also going to try port nbserverproxy to simpervisor to see what kind of API affordances I’m missing. Primarily, I feel there should be a lock somewhere, but:

  1. I don’t have a unified theory of what needs locking & why
  2. I don’t know if I need a lock or another synchronization mechanism
  3. I don’t know how it’ll actually be used by the application

So designing by porting nbserverproxy seems right.

Devlog 2018 12 24

Gracefully exiting asyncio application

Continuing yesterday’s work on my simple supervisor library, I continued trying to propagate signals cleanly to child processes before exiting. I remembered that it isn’t enough to just propagate signals - you also have to actually reap them. This meant waiting for wait calls on them to return.

I had a task running concurrently that is waiting on these processes. So ‘all’ I had to do was make sure the application does not exit until these tasks are done. This turned out to be harder than I thought! After a bunch of reading, I recognized that what I needed to do was make sure I wait for all pending tasks before actually exiting the application.

This was more involved than I thought. It also must be done at the application level rather than in the library - you don’t want libraries doing sys.exit, and definitely don’t want them closing event loops.

After a bunch of looking and playing, it looks like what I want is in my application code is something like:

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(main())
    finally:
        loop.run_until_complete(asyncio.gather(*asyncio.Task.all_tasks()))
        loop.close()

This waits for all tasks to complete before exiting, and seems to make sure all child processes are reaped. However, I have a few unresolved questions:

  1. What happens when one of the tasks is designed to run forever, and never exit? Should we cancel all tasks? Cancel tasks after a timeout? Cancelling tasks after a timeout seems most appropriate.
  2. If a task schedules more tasks, do those get run? Or are they abandoned? This seems important - can tasks keep adding more tasks in a loop?

I am getting a better handle on what people mean by ‘asyncio is more complicated than needed’. I’m going to find places to read up on asyncio internals - particularly how the list of pending tasks is maintained.

This series of blog posts and this EuroPython talk from Lynn Root helped a lot. So did Saúl Ibarra Corretgé (one of the asyncio core devs) talk on asyncio internals

Testing

Testing code that involves asyncio, signals and processes is hard. I attempted to do so with os.fork, but decided that is super-hard mode and I’d rather not play. Instead, I wrote Python code verbatim that is then spawned as a subprocess, and use stdout to communicate back to the parent process. The child process’ code itself is inline in the test file, which is terrible. I am going to move it to its own file.

I also added tests for multiple signal handlers. I’ve been writing a lot more tests in the last few months than I was before. I credit Tim Head for a lot of this. It definitely gives me a lot more confidence in my code.

DevLog 2018 December 23

I have enjoyed keeping running logs of my coding work (devlogs) in the past, and am going to start doing those again now.

This ‘holiday’ season, I am spending time teaching myself skills I sortof know about but do not have a deep understanding of.

JupyterLab extension

I spent the first part of the day (before I started devlogging) working on finishing up a jupyterlab extension I started the day before. It lets you edit notebook metadata. I got started since I wanted to use Jupytext for my work on publishing mybinder.org-analytics.

TypeScript was easy to pick up coming from C#. I wish the phospor / JupyterLab code had more documentation though.

I ran into a bug. While following instructions to set up a JupyterLab dev setup, I somehow managed to delete my source code. Thankfully I got most of it back thanks to a saved copy in vscode. It was a sour start to the morning though.

I’ll get back on to this once the sour taste is gone, and hopefully the bug is fixed :)

asyncio: what’s next | Yuri Selivanov @ PyBay 2018

I’ve been trying to get a better handle on asyncio. I can use it, but I don’t fully understand it - I am probably leaving bugs everywhere…

From one of the asyncio maintainers. Gave me some impetus to push the default version of Python on mybinder.org to 3.7 :D I’m most excited about getting features from Trio & Curio into the standard library. Was good to hear that nobody can quite figure out exception handling, and not just me.

I discovered aioutils while searching around after this. I’ve copy pasted code that theoretically does the same things as Group and Pool from aioutils, but I’ve no idea if they are right. I’ll be using this library from now!

Processing Signals

I’m writing a simple process supervisor library to replace the janky parts of nbserverproxy. It should have the following features:

  1. Restart processes when they die
  2. Propagate signals appropriately
  3. Support a sense of ‘readiness’ probes (not liveness)
  4. Be very well tested
  5. Run on asyncio

This is more difficult than it seems, and am slowly working my way through it. (1) isn’t too difficult.

(2) is a fair bit more difficult. atexit is useless since it doesn’t do anything with SIGTERM. So I need to manage my own SIGTERM handlers. However, this means there needs to be a centralish location of some sort that decides when to exit. This introduces global state, and I don’t like that at all. But unix signals are global, and maybe there’s nothing for me to do here.

I initially created a Supervisor class that holds a bunch of SupervisedProcess’s, but it was still calling sys.exit in it. Since signals are global, I realize there’s no other real way to handle this, and so I made a global handler setup too. This has the additional advantage of being able to remove handlers when a SupervisedProcess dies, avoiding memory leaks and stuff.

Testing this stuff is hard!

I also need to make sure I don’t end up with lots of races. I’m still writing concurrent code, even without threads. Gotta be careefull. Especially with signals thrown in. Although I guess once you get a SIGTERM or SIGINT inconsistent state is not particularly worrysome.