I am trying to find a nice way to keep thoughts, ideas and notes outside my head for a few months now. I worked my way through many applications, trying to use the system they promote as the basis for my own. Along the journey, I got a better sense of what was important to me in such an application. I’ll walk through the various applications I’ve tried, along with the criteria I’ve developed by the end

## Evernote

First stop, big one. I already had misgivings about it being from a large corporation. I still paid for it, and tested it out for a few months.

• No markdown support. This immediately became a problem - I had no idea how much I’ve come to expect markdown wherever I type. They have been asked for markdown support many times, but I don’t think this is ever going to happen. I realized that markdown support is a minimum requirement for me
• Conflict resolution when I edited the same note in multiple places was really terrible. Completely unusable. I’ve 3 devices I use regularly - an iPad Pro, a Macbook and my Android Phone. Through this experience I discovered that proper cross-device syncing is very important for me as well.
• I paid Evernote so I could work offline. I already have a lot of anxiety about not having internet access (a topic for another post), and didn’t want to not have access to these if I don’t have internet access. This wasn’t very good thanks to the shitty conflict resolution, but the process made me realize that I cared a lot about full offline availability
• I also hated the UI. It was clunky, had way too many options, and none of the options I really wanted.
• There was no way to link notes together I found. I realized that my time at wikimedia had made me very accustomed to hyperlinking as a way of writing. Evernote had no clear affordances for doing this. So hyperlinking became important to me. I could also smile at the big impact Wikimedia has had on the way I think :D This was the beginning of me recognizing that a purely hierarchical setup might not work for me…

I noticed I was using Evernote less and less, and wandered the desert for a while.

## OneNote

I had been using OneNote on and off for a while. It has some of the same problems Evernote has. But the absolute no-go for me was how it handled text notes on my laptop.

This feature killed it. It was too much mental load, and I hated the extra UI it brought. It had most of the problems evernote did, and worse.

## List applications

At this point, I read some parts of Getting Things Done again, trying to control my sprawling, anxiety-filled work life. It was useful, as always, but mostly to remind me to not use my brain to store data.

I then spent a bunch of time bouncing between different todo list apps, as a way to get stuff out of my brain and into a list. Learning about kinder todo lists was very helpful.

I tried a lot of these apps - Google Keep, Nirvana, Evernote / OneNote, TickTick. Nothing really clicked, but eventually I ended up on Todoist. The app was great, sync worked fine, and my flatmate also used it. Boom!

I was still looking for a note-taking app, which I considered quite distinct from the todo list apps. I was also finding the fixed format of todoist a bit limiting. I wanted each of my Projects to have a criteria of ’this is now done’, but there wasn’t an easy way to do that in Todoist. The importance given to due dates was also nagging, and gave me anxiety.

## Inkdrop

Looking for cross-device markdown note taking apps led me to Inkdrop. Built by a single indie dev, has lots of plugins, and looked quite nice. I started using it and quite liked it in the beginning. It has a vim plugin which is an absolute requirement for me if I’m going to be writing a lot of text (or so I thought) However, I found I wasn’t using it as much. It had all the features I thought I wanted, but somehow it wasn’t quite fitting the mental model I had.

## Outliners

At this point, I heard about Roam Research. It looked really neat, and was quite different from the other note taking apps. I had always dismissed org-mode for its emacs-centrism, but this made me reconsider that. However, it was in closed beta, so I let it be. I’ll definitely re-visit it, especially if it has a good offline story.

I looked around for alternatives, and discovered workflowey. I’ll admit the list of endorsements on their home page was a big draw for me. However, when I tried to sign up, I got

My usual password length is 63, and it totally failed that test. I also discovered they don’t support markdown so that rules them out

## Dynalist

I discovered Dynalist as a Workflowey alternative. It has all the things I want, and I’ve been using it for a few days now. It’s been extremely awesome! The syncing is flakey, and I have some anxiety about losing data - but otherwise, the format itself has been great for me!

So what I’ve realized is that I don’t want a note taking app, I wanted an outliner. Outliners seem to match how my brain works much better than purely hierarchical note taking apps. I can take notes without having to worry about organizing them first, and then organize them afterwards as I see fit. Very low mental overhad between what is in my brain and what’s on the screen. I like it.

I hope I can replace todoist with Dynalist too. I’ll experiment with it later. But for now, <3 dynalist. This blog post was written in dynalist, then edited in typora. Not bad.

## Evaluation criteria

In the end, looks like I have these needs when it comes to outliners / note taking apps.

• Markdown support, ideally with something like KaTeX or MathJax support

• Cross-device sync across platforms (Mac, iPad, Android)

• Full offline support, with sync that doesn’t lose data

• Ability to hyperlink different parts of the system

I’ll look at Roam research later, but for now, I’m enjoying having a note taking system that matches how my brain seems to think.

## Credits

Lots of good conversations about this in various places with Steve Deobald, Pratul Kalia, Robla, Prateek, Ankur Sethi and many others who I am clearly forgetting.

Would love to hear about what you use, via Email or Twitter.

This person misses blogs a lot. So do I. I made many frients, built myself a platform, and expressed myself in ways I do not feel like I can anymore on my various blogs. Some of them are lost to time, but some are there if you know how to find it.

Part of it was where my life was - I had time, nobody physically nearby I jived with, and a lot of angst. But now, I have too little time, and too many (hah!) wonderful people I enjoy interacting with on a 1-1 basis, nearby or not. Plus, with the death of blogging platforms (Medium is a publishing platform, not a blogging one), content is harder to find. As the author points out,

The other day I searched for an hour and couldn’t find even one. They used to be endless. You’d just click on one you knew on Blogger and either click Blogger’s random blog button, or go to the sidebar of the blog you knew where they always had a list of blogs they liked, sometimes four or 5, sometimes 20 other blogs. And the same with Tumbler.

Technorati is dead, and there isn’t anything like it. Twitter, Facebook, Instagram, Medium, etc are no replacements for long form text insight into someone’s lives. They have brought the internet experience to way more people, but I feel like they have taken something away from me.

I guess this will be one of those blog posts that end with me saying “I will try to blog more” and then there are no more posts for the next few years :)

## Physical activity

I walked for about 6 miles yesterday evening, after doing 3-4 miles each day for about 2-3 days before. The roads were empty, and it was lovely. I’ve listened to maybe 6-10 hours of Deborah Frances-White in the last week or so, split between The Guilty Feminist and Global Pillage. Next time though, I’m going to try listen to podcasts less & observe my surroundings more. I am doing this (plus PT) to rehab my knee mostly. I slept for 12h yesterday night :)

I don’t have enough experience with the pitfalls of concurrent programming where you have to use synchronization techniques. I had my first deadlock today, and after a few minutes realized it was actually an infinite recursion from a typo. I need to get a better theoretical understanding of both event loop based async programming and synchronization methods. I’ve used them with threading & Java, but feel shaky in asyncio.

simperviser now has 100% unit test coverage. But it combines asyncio, processes & signals - so that doesn’t give me enough confidence as it might have otherwise. I’ll take it though :)

## VSCode customization

I’ve been re-reading the pragmatic programmer again (after initially reading it about 12 years or so ago). A lot of it still holds up, although some stuff is date (love for Broken Window policing & perl). It reminded me that I hadn’t really spent much time customizing and being more productive in VSCode, so am spending time today doing that.

I switch between vscode and my terminal quite a bit, mostly for git operations and running tests. I’m going to see if I can stay inside vscode comfortably for running pytest based tests.

It made me very sad that the only thing I wanted from the pytest integration in vscode is something the maintainers aren’t actively working on - shortcut to run test the cursor is currently at. I also can’t seem to see test output directly.

I think I’ll be using the Terminal for pytest runs for now. I’m not even going to try with git.

If I was younger and not already full of projects to do, I’d have picked up and tried to make a PR for this. Boo time commitments.

I rallied late in the day & wrote some code around readyness checks in simpervisor. I don’t fully understand what I want it to do, but I’m going to look at what kubernetes does & try to follow that. Since this is being written for nbserverproxy, I am also going to try port nbserverproxy to simpervisor to see what kind of API affordances I’m missing. Primarily, I feel there should be a lock somewhere, but:

1. I don’t have a unified theory of what needs locking & why
2. I don’t know if I need a lock or another synchronization mechanism
3. I don’t know how it’ll actually be used by the application

So designing by porting nbserverproxy seems right.

## Gracefully exiting asyncio application

Continuing yesterday’s work on my simple supervisor library, I continued trying to propagate signals cleanly to child processes before exiting. I remembered that it isn’t enough to just propagate signals - you also have to actually reap them. This meant waiting for wait calls on them to return.

I had a task running concurrently that is waiting on these processes. So ‘all’ I had to do was make sure the application does not exit until these tasks are done. This turned out to be harder than I thought! After a bunch of reading, I recognized that what I needed to do was make sure I wait for all pending tasks before actually exiting the application.

This was more involved than I thought. It also must be done at the application level rather than in the library - you don’t want libraries doing sys.exit, and definitely don’t want them closing event loops.

After a bunch of looking and playing, it looks like what I want is in my application code is something like:

if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.close()


This waits for all tasks to complete before exiting, and seems to make sure all child processes are reaped. However, I have a few unresolved questions:

1. What happens when one of the tasks is designed to run forever, and never exit? Should we cancel all tasks? Cancel tasks after a timeout? Cancelling tasks after a timeout seems most appropriate.
2. If a task schedules more tasks, do those get run? Or are they abandoned? This seems important - can tasks keep adding more tasks in a loop?

I am getting a better handle on what people mean by ‘asyncio is more complicated than needed’. I’m going to find places to read up on asyncio internals - particularly how the list of pending tasks is maintained.

This series of blog posts and this EuroPython talk from Lynn Root helped a lot. So did Saúl Ibarra Corretgé (one of the asyncio core devs) talk on asyncio internals

### Testing

Testing code that involves asyncio, signals and processes is hard. I attempted to do so with os.fork, but decided that is super-hard mode and I’d rather not play. Instead, I wrote Python code verbatim that is then spawned as a subprocess, and use stdout to communicate back to the parent process. The child process’ code itself is inline in the test file, which is terrible. I am going to move it to its own file.

I also added tests for multiple signal handlers. I’ve been writing a lot more tests in the last few months than I was before. I credit Tim Head for a lot of this. It definitely gives me a lot more confidence in my code.

I have enjoyed keeping running logs of my coding work (devlogs) in the past, and am going to start doing those again now.

This ‘holiday’ season, I am spending time teaching myself skills I sortof know about but do not have a deep understanding of.

## JupyterLab extension

I spent the first part of the day (before I started devlogging) working on finishing up a jupyterlab extension I started the day before. It lets you edit notebook metadata. I got started since I wanted to use Jupytext for my work on publishing mybinder.org-analytics.

TypeScript was easy to pick up coming from C#. I wish the phospor / JupyterLab code had more documentation though.

I ran into a bug. While following instructions to set up a JupyterLab dev setup, I somehow managed to delete my source code. Thankfully I got most of it back thanks to a saved copy in vscode. It was a sour start to the morning though.

I’ll get back on to this once the sour taste is gone, and hopefully the bug is fixed :)

## asyncio: what’s next | Yuri Selivanov @ PyBay 2018

I’ve been trying to get a better handle on asyncio. I can use it, but I don’t fully understand it - I am probably leaving bugs everywhere…

From one of the asyncio maintainers. Gave me some impetus to push the default version of Python on mybinder.org to 3.7 :D I’m most excited about getting features from Trio & Curio into the standard library. Was good to hear that nobody can quite figure out exception handling, and not just me.

I discovered aioutils while searching around after this. I’ve copy pasted code that theoretically does the same things as Group and Pool from aioutils, but I’ve no idea if they are right. I’ll be using this library from now!

## Processing Signals

I’m writing a simple process supervisor library to replace the janky parts of nbserverproxy. It should have the following features:

1. Restart processes when they die
2. Propagate signals appropriately
3. Support a sense of ‘readiness’ probes (not liveness)
4. Be very well tested
5. Run on asyncio

This is more difficult than it seems, and am slowly working my way through it. (1) isn’t too difficult.

(2) is a fair bit more difficult. atexit is useless since it doesn’t do anything with SIGTERM. So I need to manage my own SIGTERM handlers. However, this means there needs to be a centralish location of some sort that decides when to exit. This introduces global state, and I don’t like that at all. But unix signals are global, and maybe there’s nothing for me to do here.

I initially created a Supervisor class that holds a bunch of SupervisedProcess’s, but it was still calling sys.exit in it. Since signals are global, I realize there’s no other real way to handle this, and so I made a global handler setup too. This has the additional advantage of being able to remove handlers when a SupervisedProcess dies, avoiding memory leaks and stuff.

Testing this stuff is hard!

I also need to make sure I don’t end up with lots of races. I’m still writing concurrent code, even without threads. Gotta be careefull. Especially with signals thrown in. Although I guess once you get a SIGTERM or SIGINT inconsistent state is not particularly worrysome.

This post is from conversations with Matt Rocklin and others at the PANGEO developer meeting at NCAR

Today, almost all of ’the cloud’ is run by ruthlessly competitive hypercapitalist large scale organizations. This is great & terrible.

When writing open source applications that primarily run on the cloud, I try to make sure my users (primarily people deploying my software for their users) have the following freedoms:

1. They can run the software on any cloud provider they choose to
2. They can run the software on a bunch of computers they physically own, with the help of other open source software only

Ensuring these freedoms for my users requires the following restrictions on me:

1. Depend on Open Source Software with hosted cloud versions, not proprietary cloud-vendor-only software.

I’ll use PostgreSQL over Google Cloud Datastore. Kubernetes with autoscaling over talking to the EC2 api directly.

2. Use abstractions that allow swappable implementations anytime you have to talk to a cloud provider API directly.

Don’t talk to the S3 API directly, but have an abstract interface that defines exactly what your application needs, and then write an S3 implementation for it. Ideally, also write a minio / ceph / file-system implementation for it, to make sure your abstraction actually works.

These are easy to follow once you are aware of them, and provide good design tradeoffs for Open Source projects. Remember these are necessary but not sufficient to ensure some of your users’ fundamental freedoms.

I’m writing up monthly ‘work plans’ to plan what work I’m trying to do every month, and do a retrospective after to see how much I got done. I work across a variety of open source projects with ambiguous responsibilities, so work planning isn’t very set. This has proven to be somewhat quite stressful for everyone involved. Let’s see if this helps!

## JupyterCon

JupyterCon is in NYC towards the end of August, and it is going to set the pace for a bunch of stuff. I have 2.5-ish talks to give. Need to prepare for those and do a good job.

## Matomo (formerly Piwiki) on mybinder.org

mybinder.org currently uses Google Analytics. I am not a big fan. It has troubling privacy implications, and we don’t get data as granularly as we want to. I am going to try deploying Matomo (formerly Piwiki) and using that instead. Run both together for a while and see how we like it! Matomo requires a MySQL database & is written in PHP - let’s see how this goes ;)

## The Littlest JupyterHub 0.1 release

The Littlest JupyterHub is doing great! I’ve done a lot of user tests, and the distribution has changed drastically over time. It’s also the first time I’m putting my newly found strong convictions around testing, CI & documentation to practice. You can already check it out on GitHub. I want to make sure we (the JupyterHub team) gets out a 0.1 release early August.

## Pangeo Workshop

I despair about climate change and how little agency I seem to have around it quite a bit. I’m excited to go to the PANGEO workshop in Colorado. I’m mostly hoping to listen & understand their world some more.

## Berkeley DataHub deployment

Aug 20-ish is when next semester starts at UC Berkeley. I need to have a pretty solid JupyterHub running there by then. I’d like it to have good CI/CD set up in a generic way, rather than something super specific to Berkeley. However, I’m happy to shortcut this if needed, since there’s already so many things on my plate haha.

## UC Davis

I’m trying to spend a day or two a month at UC Davis. Partially because I like being on Amtrak! I also think there’s a lot of cool work happening there, and I’d like to hang out with all the cool people doing all the cool work.

## Personal

On top of this, there’s ongoing medical conditions to be managed. I’m getting Carpel Tunnel Release surgery sometime in October, so need to make sure I do not super fuck up my hands before then. I’m also getting a cortisone shot for my back in early August to deal with Sciatica. Fun!

## Things I’m not doing!

The grading related stuff I’ve been working on is going to the backburner for a while. I think I bit off far more than I can chew, so time to back off. I also do not have a good intuition for the problem domain since I’ve never written grading keys nor have I been a student in a class that got autograded.

## In conclusion…

Shit, I’ve a lot of things to do lol! I’m sure I’m forgetting some things here that I’ve promised people. Let’s see how this goes!

Inspired by conversations with Nick Bollweg and Matt Rocklin, I experimented with using conda constructor as the installer for The Littlest JupyterHub. Theoretically, it fit the bill perfectly - I wanted a way to ship arbitrary packages in multiple languages (python & node) in an easy to install self-contained way, didn’t want to make debian packages & wanted to use a tool that people in the Jupyter ecosystem were familiar with. Constructor seemed to provide just that.

I sortof got it working, but in the end ran into enough structural problems that I decided it isn’t the right tool for this job. This blog post is a note to my future self on why.

This isn’t a ’takedown’ of conda or conda constructor - just a particular use case where it didn’t work out and a demonstration of how little I know about conda. It probably works great if you are doing more scientific computing and less ‘ship a software system’!

## Does not work with conda-forge

I <3 conda-forge and the community around it. I know there’s a nice jupyterhub package there, which takes care of installing JupyterHub, node, and required node modules.

However, this doesn’t actually work. conda constructor does not support noarch packages, and JupyterHub relies on several noarch packages. From my understanding, more conda-forge packages are moving towards being noarch (for good reason!).

Looking at this issue, it doesn’t look like this is a high priority item for them to fix anytime soon. I understand that - they don’t owe the world free work! It just makes conda constructor a no-go for my use case…

## No support for pip

You can pip install packages in a conda environment, and they mostly just work. There are a lot of python packages on PyPI that are installable via pip that I’d like to use. constructor doesn’t support bundling these, which is entirely fair! This PR attempted something here, but was rejected.

So if I want to keep using packages that don’t exist in conda-forge yet but do exist in pip, I would have to make sure these packages and all their dependencies exist as conda packages too. This would be fine if constructor was giving me enough value to justify it, but right now it is not. I’ve also tried going down a similar road (cough debian cough) and did not want to do that again :)

## Awkward post-install.bash

I wanted to set up systemd units post install. Right off the bat this should have made me realize conda constructor was not the right tool for the job :D The only injected environment variable is \$PREFIX, which is not super helpful if you wanna do stuff like ‘copy this systemd unit file somewhere’. I ended up writing a small python module that does all these things, and calling it from post-install. However, even then I couldn’t pass any environment variables to it, making testing / CI hard.

## Current solution

Currently, we have a bootstrap script that downloads miniconda, & bootstraps from there to a full JupyterHub install. Things like systemd units & sudo rules are managed by a python module that is called from the bootstrap script.

This idea comes from brainstorming along with Lindsey Heagy, Carol Willing, Tim Head & Nick Bollweg at the Jupyter Team Meeting 2018. Most of the good ideas are theirs! The name is inspired by one of favorite TV series of one of my favorite people.

I really love the idea of JupyterHub distributions - opinionated combination of components that target a specific use case. The Zero to JupyterHub distribution is awesome & works for most people. However, it requires Kubernetes - a distributed system with inherent complexities that is not worth it below a certain threshold.

This blog post lays out ideas for implementing a simpler, smaller distribution called The Littlest JupyterHub. The Littlest JupyterHub serves the long tail of potential JupyterHub users who have the following needs only.

1. Support a very small number of students (around 20–30, maybe 50)
2. Run on only one node, either a cheap VPS or a VM on their favorite cloud provider
3. Provide the same environment for all students
4. Allow the instructor / admin to easily modify the environment for students with no specialized knowledge
5. Be extremely low maintenance once set up & easily fixable when it breaks
7. Enforce memory / CPU limits for students

The target audience is primarily educators teaching small classes with Jupyter Notebooks. It should be an extremely focused distribution, with new feature requests facing higher scrutiny than usual. It has a legitimate chance of actually reaching 1.0 & being stable, requiring minimal ongoing upgrades!

## JupyterHub setup

JupyterHub + ConfigurableHTTPProxy run as standard systemd services. Systemd spawner is used - it is lightweight, allows JupyterHub restarts without killing user servers & provides CPU / memory isolation.

Something like First Use Authenticator

• a user whitelist might be good enough for a large number of users. New authenticators are added whenever users ask for them.

The JupyterHub system is in its own root owned conda environment or virtualenv, to prevent accidental damage from users.

## User environment

There is a single conda environment shared by all the users. JupyterHub admins have write access to this environment, and everyone else has read access. Admins can install new libraries for all users with conda/pip. No extra steps needed, and you can do this from inside JupyterHub without needing to ssh.

Each user gets their own home directory, and can install packages there if they wish. systemdspawner puts each user server in a systemd service, and provides fine grained control over memory & cpu usage. Users also get their own system user, providing an additional layer of security & standardized home directory locations.

## Configuration

YAML is used for config - it is the least bad of all the currently available languages, IMO. Ideally, something like the visudo command would exist for editing & applying this config. It’ll open the config file in an editor, allow users to edit it, and apply it only if it is valid. Advanced users can sidestep this and edit files directly. The YAML file is read and processed directly in jupyterhub_config.py. This simplifies things & gives us fewer things to break.

Backwards compatible upgrading will be supported across one minor version only - so you can go from 0.7 to 0.8, but not 0.9. Upgrades should not cause outages.

## Installation mechanism

Users run a command on a fresh server to install this distribution. This could use conda constructor (thanks to Nick Bollweig & Matt Rocklin for convincing me!) or debian packages (with fpm or dh-virtualenv). The user environments will be conda environments.

A curl <some-url> | sudo bash command is available in a nice looking website for the distribution that users can copy paste into their fresh VM. This website also has instructions for creating a fresh VM in popular cloud providers & VPS providers.

## Debuggability

All systems exist in a partially degraded state all the time. Good systems self-heal & continue to run as well as they can. When they can’t, they break cleanly in known ways. They are observable enough to debug the issues that cause 80% of the problems.

The Littlest JupyterHub should be a good system. Systemd captures logs from JupyterHub, user servers & the proxy. Strong validation of the config file catches fatal misconfigurations. Reboots actually fix most issues and never make anything worse. Screwed up user environments are recoverable.

We’ll discover how this breaks as users of varying skill levels use it, and update our tooling accordingly.

## But, No Docker?

Docker has been explicitly excluded from this tech stack. Building custom docker images & dealing with registries is too complex most educators. A good distribution embraces its constraints & does well!

## Contribute

Are you a person who would use a distribution like this? We would love to hear from you! Make an issue on GitHub, tweet at me, or send me an email.

Recently I had to write some code that had to call the kubernetes API directly, without any language wrappers. While there is pretty good reference docs, I didn’t want to go and construct all the JSON manually in my programming language.

I discovered that kubectl’s -v parameter is very useful for this! With this, I can do the following:

1. Perform the actions I need to perform with just kubectl commands
2. Pass -v=8 to kubectl when doing this, and this will print all the HTTP traffic (requests and responses!) in an easy to read way
3. Copy paste the JSON requests and template them as needed!

This was very useful! The fact you can see the response bodies is also nice, since it gives you a good intuition of how to handle this in your own code.

If you’re shelling out to kubectl directly in your code (for some reason!), you can also use this to figure out all the RBAC rules your code would need. For example, if I’m going to run the following in my script:

kubectl get node


and need to figure out which RBAC rules are needed for this, I can run:

kubectl -v=8 get node 2>&1 | grep -P 'GET|POST|DELETE|PATCH|PUT'


This should list all the API requests the code is making, making it easier to figure out what rules are needed.

Note that you might have to rm -rf ~/.kube/cache to ‘really’ get the full API requests list, since kubectl caches a bunch of API autodiscovery. The minimum RBAC for kubectl is:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1