designing data intensive applications

I’ve been reading Designing Data Intensive Applications book & am using this post to keep notes!

I’ve picked up ideas on scaling systems through the years, but never actually sat down to actually study them semi-formally. This seems like a great start to it!

It’s a pretty big book, and it’s gonna take me a while to go through it :) Will update these notes as I go! Trying to do a chapter a week!

Chapter 1: Defining all the things

The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? Alan Kay, in interview with Dr Dobb’s Journal (2012)

I keep forgetting what an amazing marvel the internet is and how intensely (and mostly positively, thankfully) it has affected my life. This is a good reminder! However, perhaps to people who haven’t had the privileges I’ve had the Internet doesn’t feel like a natural resource? Unsure! Should ask them!

Lots of modern applications are data intensive, rather than CPU intensive.

Raw CPU power is rarely a limiting factor for these applications—bigger problems are usually the amount of data, the complexity of data, and the speed at which it is changing.

This has borne out in the infrastructure I’ve been setting up to help teach people data science - RAM is often the bottleneck, not CPU (barring machine-learning type stuff, but they want GPUs anyway).

Common building blocks for data intensive applications are:

  1. Store data so that they, or another application, can find it again later (databases)
  2. Remember the result of an expensive operation, to speed up reads (caches)
  3. Allow users to search data by keyword or filter it in various ways (search indexes)
  4. Send a message to another process, to be handled asynchronously (stream processing)
  5. Periodically crunch a large amount of accumulated data (batch processing)

These do seem to cover a large variety of bases! I feel fairly comfortable in operating, using and building on top of some of these (databases, caches) but not so much in most (never used a search index, batch processing, nor streams outside of redis). Partially I haven’t felt an intense need for these, but perhaps if I understand them more I’ll use them more? I’ve mostly strived to make everything stateless - but perhaps that’s causing me to shy away from problems that can only be solved with state? /me ponders.

Boundaries around ‘data systems’ are blurring - Redis is a cache but can be a message queue, Apache Kafka is a message queue that can have durability guarantees, etc. Lots of applications also need more than can be done with just one tool (aka a ‘pure LAMP’ stack is no longer good enough). Applications often have the job of making sure different data sources are in sync. Everyone is a ‘data designer’, and everyone is kinda fucked.

Talk about 3 things that are most important to any software system.

Reliability

Means ‘continue to work correctly, even when things go wrong’. Things that go wrong are ‘faults’, and systems need to be ‘fault-tolerent’ or ‘resilient’. Can’t be tolerant of all faults, so gotta define what faults we’re tolerant of.

Fault isn’t failure - fault is when a component of the system ‘deviates from its spec’, failure is when the system as a whole stops providing user server they want. Can’t reduce chances of fault to zero, but can work on reducing failures to zero.

Engineering is building reliable systems from unreliable parts.

Chaos monkeys are good, increase faults to find ways to reduce failure.

Hardware reliability - physical components fail. Nothing you can do about it. Fix it in software.

Hardware faults usually not corelated - one macine failing doesn’t cause another machine to fail. To truly fuck shit up you need software - can easily cause massive large scale failure! For example, a leap second bug! Or a runaway process that slowly kills every other process on the machine. One of the microservies that 50 of your microservices depend on is slow! Cascading failures! These bugs all lie dormant, until they suddenly aren’t and wreak havok. The software makes some assumption about its environment, which is true until it isn’t. No quick solution to systematic software faults.

Human error is worst error. The book offers some suggestions on how to prevent these.

  1. Minimize opportunities for errors - make it easy to do the right thing. But if it’s too restrictive, people will work around it - tricky balance.
  2. Provide full featured sandboxes so people can fuck around without fucking shit up.
  3. AUTOMATICALLY TEST EVERYTHING so when a human does fuck up, they know!
  4. Set up undo functionality, so when human does fuck up, they can roll back!

Learn about telemetry from other disciplines that have been doing this shit for far longer than us. Relevant XKCD

Reliability isn’t just for nukes & aircraft & election systems (haha). Imagine someone loses a video of their kid’s first ever step because you didn’t care. Fucking up is human and we all do it - what is important is that we care.

Sometimes you gotta sacrifice reliability, but make sure that is an explicit & conscious decision. Actually throw away your prototypes! Put FIXMEs in your code. Take a shower. Make sure hacks look, feel and sound hacky!

Scalability

System’s ability to adapt to increased ’load’ along some axes.

Load is described with various load parameters, which depend on the system (req/s? active users? etc).

Carefully define what this means for your application, and explain your reasoning. You might have to scale in some aspects but not in other.

Once you have the load parameters for your app defined, figure out what happens when you increase load parameters but keep system resources unchanged. After that, try to figure out how much resources need to be increased.

Throughput - number of things that can be done per second. Latency is time it takes to serve a request. These are common things we care about when we move load parameters up and down.

You shouldn’t think of these as single numbers, since they vary a fair bit. Think of these as probability distributions. Learn some statistics! Use percentiles, rather than ‘average’ or ‘mean’.

High percentile latencies are especially important when you are a service that’s called by many other services - it can cascade down.

No magic scaling sauce - architecture that can scale is different for each application. But there are general purpose building blocks, so worry a little less!

Maintainability

Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

Split into three major aspects.

Operability

Make it easy for people to operate your service! Help them monitor the health of the system, observe & debug problems, do capacity planning, keep the production environment stable, prevent single human points of failure (oh, only Chad knows about this system) and many other things!

Simplicity

Don’t make your software a big ball of mud. Take into account that new engineers will have to start working on your software, and they need to understand it quickly.

Use standard tools & approaches they have a higher likelihood of knowing - look around for standard tools before inventing your own!

Watch out for accidental complexity, and keep it to a minimum as much as possible. Abstractions are good, but abstractions are also leaky.

Evolvability

If your software is simple & has good abstractions, you can change it over time without wanting to pull all your hair out.

think os

Following a trail from a wonderful Julia Evans post led me to Allen Downey’s nice textbook manifesto. Also led me to the nice Think OS book, which seems like a super nice introduction to Operating System principles.

It is short enough (~100 pages) that I wanted to read through it. I’ve spent a good chunk of time absorbing how Operating Systems work by dint of diving into things and working through them, but it would be nice to get a refresher on the basics. There are clearly basic things I do not understand, and this seemed like a good way to explore.

This post is just a running series of notes from me reading it on a nice saturday morning.

Stack vs Heap

This is something that has always bugged me. I’ve understood just enough of this by being burnt with pointers when writing C (and primitive types in the CLR, etc), but was lacking a deep understanding of wtf was going on. The fact that these are just process program segments (like text or data) was quite a revelation :D This stackoverflow answer was also quite nice.

One interesting thing for me to investigate later from the book is how this program:

#include <stdio.h>
#include <stdlib.h>

int global;

int main() {
  int local = 5;
  void *p = malloc(128);

  printf("Address of main is %p\n", main);
  printf("Address of local is %p\n", &local);
  printf("Address of global is %p\n", &global);
  printf("Address of p is %p\n", p);

}

produces the following output for the author:

Address of main   is 0x      40057c
Address of local  is 0x7fffd26139c4
Address of global is 0x      60104c
Address of p      is 0x     1c3b010

but for me,

Address of main   is 0x5598fc64c740
Address of local  is 0x7ffeacfaf75c
Address of global is 0x5598fc84d014
Address of p      is 0x5598fc85b010

The point of the program was to demonstrate that text (main), static (global) and heap (p) are near beginning of memory and stack (local) is towards the end. While on my laptop it does seem to be the case too, the ‘start’ seems to be much farther out than on the author’s computer. Need to understand why this is the case. I’ve vaguely heard of address randomization & other security measures in OS kernels - maybe related? For another day!

Bit twiddling

I continue to find it hard to care about bit twiddling. Most things do use it of course, but it seems to be abstracted away pretty well without leaking too much (except for things that have their own nuances, like floating point representations).

malloc

Nice link to a paper about a common malloc implementation. I also know there are other malloc implementations that programs use (such as jemalloc). Something for me to dive into when I’ve more time.

tbc

I didn’t have time to finish it all, unfortunately. But shall come back to it whenever I can!

learning selinux and apparmor

I am trying to understand SELinux and AppArmor, and collecting resources here as I learn. k

SELinux for mere mortals (2014)

This was the first video I watched, and it helped me understanding what SELinux does at a fundamental basic level. It’s probably useless in a container-filled world (where I doubt Fedora shipes pre-configured SELinux rules for my containers), but it helped me think I understood types / labels, so that seems like a positive step?

The fact the presenter keeps saying things like ‘you being a good sysadmin, ssh into the server and edit the apache config file’ is freakin me out. If I’m constantly editing config files on servers manually that seems like a massive failure to me :D How times change!

Docker and SELinux (2014)

This one made a lot more sense to me as an answer to the following questions:

  1. Aren’t containers secure enough? (Partial answer)
  2. What does SELinux do for container security?

It’s convinced me that container -> host isolation and container <-> container isolation provided by SELinux is pretty simple and super useful, and should be turned on.

This talk also showed me this most wonderful coloring book that tries to explain SELinux. If this is all that is to SELinux, it seems pretty simple and useful (for the container use case).

Also, it looks like there are more recent versions of both these two talks - I should look ’em up!

Securing Linux Applications with AppArmor (2007?)

This is me trying to understand AppArmor, which seems to have lower base of support (just Ubuntu? Maybe SUSE, but idk anyone who uses SUSE) but theoretically simpler (mostly file path based). The video seems to be shot with a potato, so the slides aren’t super clear - but the content is good enough to give me a super general overview.

The biggest thing against SELinux it talks about seems to be ‘SELinux is complex’, and not much else. I don’t know how much I buy that - but then again, I haven’t actually used SELinux anywhere :D

Unlike SELinux, I can actually see AppArmor rules on my local machine (since it is running Ubuntu). Seems fairly readable!

things to build

This is a running list of things I want to build!

There’s an analogous running list of things I want to learn. Things move between them :) I also have higher standards of documentation (other people should be able to use it) before marking these as complete.

  • kubernetes-login A helper to openssh that allows users to log in to a configurable user pod running on a kubernetes cluster. Should ideally support scp / sftp too. Helps get rid of SPOF login nodes

  • just-enough-containment A purely for-learning docker-ish container project written purely in python. Written for pedagogy and personal understanding rather than production use.

python gil resources

I was in a conversation about the Python GIL with friends a few days ago, and realized that my understanding of the specifics of the GIL problem were super hand-wavy & unstructured. So I spent some time collecting resources to learn more, and now have a better understanding!

Python’s Infamous GIL (Larry Hastings)

This was a great introduction to the history of the GIL, why it was necessary & reasons why getting rid of it is complicated.

Understanding the Python GIL (David Beazley)

This has wonderful visualizations that really helped me understand exactly why multi-threaded python behaves the way it does. Multithreading decreases performance, adding more cores decreases performance & disabling cores increases performance :) All of this made vague hand-wavy sense to me before, and make much more concrete sense now.

It isn’t easy to remove the GIL (Guido van Rossum)

A blog post from the BDFL of python, after yet another request to ‘just get rid of the GIL’.

It set the (pretty high) bar for inclusion of a GIL removal patch (that he makes clear he will not write) in Python:

I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.

Not been met yet!

An Inside Look at the GIL Removal Patch of Lore (Dave Beazley)

There was an attempt in about 1999 to remove the GIL - the ‘freethreading’ patch. This is a wonderful analysis of that patch - what it tried to do, why it disappeared, what the performance costs of it were, etc. Something that really stood out to me and makes me feel not very hopeful about GIL removal in CPython was:

Despite removing the GIL, I was unable to produce any performance experiment that showed a noticeable improvement on multiple cores. Really, the only benefit (ignoring the horrible performance) seen in pure Python code, was having preemptible instructions.

This seems to be still true, even in the Gilectomy branch.

Gilectomy (Larry Hastings)

This is the only talk about a recent (~2016) GIL removal attempt.

It is amazing work, but doesn’t give me much hope. There’s been no new commits to the public git repo for about 5 months now, so am unsure what the state of it now is.

There’s probably many more - let me know if you know any, and I’ll update this when I find out more!

Gilectomy - 2017 (Larry Hastings)

PyCon 2017 just happened, and Larry Hastings gave another talk!

It seems to have had a lot of intense work done on it, and the wall clock time graph in it warms my heart! I’ve a little more hope now than I did after the 2016 talk :D

systemd simple containment for GUI applications & shells

I earlier had a vaguely working setup for making sure browsers, shells and other applications don’t eat all RAM / CPU on my machine with systemd + sudo + shell scripts.

It was a hacky solution, and also had complications when used to launch shells. It wasn’t passing in all the environment varialbes it should, causing interesting-to-debug issues. sudo rules were complex, and hard to do securely.

I had also been looking for an excuse to learn more Golang, so I ended up writing systemd-simple-containment or ssc.

It’s a simple golang application that produces a binary that can be setuid to root, and thus get around all our sudo complexity, at the price of having to be very, very careful about the code. Fortunately, it’s short enough (~100 lines) and systemd-run helps it keep the following invariants:

  1. It will never spawn any executable as any user other than the ‘real’ uid / gid of the user calling the binary.
  2. It doesn’t allow arbitrary systemd properties to be set, ensuring a more limited attack surface.

However, this is the first time I’m playing with setuid and with Go, so I probably fucked something up. I feel ok enough about my understanding of real and effective uids for now to use it myself, but not to recommend it to other people. Hopefully I’ll be confident enough say that soon :)

By using a real programming language, I also easily get commandline flags for sharing tty or not (so I can use the same program for launching GUI & interactive terminal applications), pass all environment variables through (which can’t be just standard child inheritence, since systemd-run doesn’t work that way) & the ability to setuid (you can’t do that easily to a script).

I was sure I’d hate writing go because of the constant if err != nil checks, but it hasn’t bothered me that much. I would like to write more Go, to get a better feel for it. This code is too short to like a language, but I definitely hate it less :)

Anyway, now I can launch GUI applications with ssc -tty=false -isolation=strict firefox and it does the right thing. I currently have available -isolation=strict and -isolation=relaxed, the former performing stronger sandboxing (NoNewPrivileges, PrivateTmp) than the latter (just MemoryMax). i’ll slowly add more protections here, but just keep two modes (ideally).

My Gnome Terminal shell command is now ssc -isolation=relaxed /bin/bash -i and it works great :)

I am pretty happy with ssc as it exists now. Only thing I now want to do is to be able to use it from the GNOME launcher (I am using GNOME3 with gnome-shell). Apparently shortcuts are no longer cool and hence pretty hard to create in modern desktop environments :| I shall keep digging!

systemd gui applications

Update: There’s a follow-up post with a simpler solution now.

Ever since I read Jessie Frazelle’s amazing setup (1, 2, 3) for running GUI applications in docker containers, I’ve wanted to do something similar. However, I want to install things on my computer - not in docker images. So what I wanted was just isolation (no more Chrome / Firefox freezing my laptop), not images. I’m also not as awesome (or knowledgeable!) as Jess, so will have to naturally settle for less…

So I am doing it in systemd!

Before proceeding, I want to warn y’all that I don’t entirely know what I am doing. Don’t take any of this as security advice, since I don’t entirely understand X’s security model. Works fine for me though!

GUI applications

I started out using a simple systemd templated service to launch GUI applications, but soon realized that systemd-run is probably the better way. So I’ve a simple script, /usr/local/bin/safeapp:

#!/bin/bash
exec sudo systemd-run  \
    -p CPUQuota=100% \
    -p MemoryMax=70% \
    -p WorkingDirectory=$(pwd) \
    -p PrivateTmp=yes \
    -p NoNewPrivileges=yes \
    --setenv DISPLAY=${DISPLAY} \
    --setenv DBUS_SESSION_BUS_ADDRESS=${DBUS_SESSION_BUS_ADDRESS} \
    --uid ${USER} \
    --gid ${USER} \
    --quiet \
    "$1"

I can run safeapp /opt/firefox/firefox now and it’ll start firefox inside a nice systemd unit with a 70% Memory usage cap and CPU usage of at most 1 CPU. There’s also other minimal security stuff applied - NoNewPrivileges being the most important one. I want to get ProtectSystem + ReadWriteDirectories going too, but there seems to be a bug in systemd-run that doesn’t let it parse ProtectSystem properly…

Also, there’s an annoying bug in systemd v231 (which is what my current system has) - you can’t set CPUQuotas over 100% (aka > 1 CPU core). This is annoying if you want to give each application 3 of your 4 cores (which is what I want). Next version of Ubuntu has v232, so my GUI applications will just have to do with an aggregate of 1 full core until then.

The two environment variables seem to be all that’s necessary for X applications to work.

And yes, this might ask you for your password. I’ll clean this up into a nice non-bash script hopefully soon, and make all of these better.

Anyway, it works! I can now open sketchy websites with scroll hijacking without fear it’ll kill my machine!

CLI

I wanted each tab in my terminal to be its own systemd service, so they all get equitable amount of CPU time & can’t crash machine by themselves with OOM.

So I’ve this script as /usr/local/bin/safeshell

`#!/bin/bash
exec sudo systemd-run \
    -p CPUQuota=100% \
    -p MemoryMax=70% \
    -p WorkingDirectory=$(pwd) \
    --uid yuvipanda \
    --gid yuvipanda \
    --quiet \
    --tty \
    /bin/bash -i

The --tty is magic here, and does the right things wrt passing the tty that GNOME terminal is passing in all the way to the shell. Now, my login command (set under profile preferences > command in gnome-terminal) is sudo /usr/local/bin/safeshell. In addition, I add the following line to /etc/sudoers:

%sudo ALL = (root) NOPASSWD:SETENV: /usr/local/bin/safeshell

This + just specifying the username directly in safeshell are both hacks that make me cringe a little. I need to either fully understand how sudo’s -E works, or use this as an opportunity to learn more Go and make a setuid binary.

To do

[ ] Generalize this to not need hacks (either with better sudo usage or a setuid binary) [ ] Investigate adding more security related options. [ ] Make these work with desktop / dock icons.

I’d normally have just never written this post, on account of ‘oh no, it is imperfect’ or something like that. However, that also seems to have come in the way of ability to find joy in learning simple things :D So I shall follow b0rk’s lead in spending time learning for fun again :)

things to learn

Keeping a running list of things I want to learn!

There’s also a list of things I want to build.

  • How to use org mode properly? Should I use it for notes over markdown?
  • Develop a deep understanding of how networks work.
  • How do linux network namespaces work?
  • How to run GUI apps with systemd?
  • What exactly is a ’tty'?
  • How does HTTP2 actually work?
  • How do X509 / TLS certificates work?
  • How to use cgroups directly?
  • Can I just use emacs terminals for all my terminal needs?
  • How does NFS work, and why is it so crappy?
  • How does ssh work?
  • How does mosh work?
  • How do contact lenses work? HOW DO LENSES WORK?
  • Can I simply run a local DNS recursor on my laptop for performance & blocking me from visiting the orange website?
  • What is SELinux? Why and how would I use it?
  • What is AppArmor? Why and how would I use it, over SELinux?
  • What is seccomp, and when/why/how would I use it?

If you know of resources that’ll help me learn these things, do let me know!

moving to hugo

I’m attempting to now blog at http://words.yuvi.in, using hugo rather than wordpress.

Over the last few years, IRC, Twitter & WhatsApp have ruined my public writing. I shall now slowly attempt to bring that back :)

liberal software

I ran into this thought provoking though when randomly attempting to relax this weekend. There’s a summary at LWN if you do not want to watch the talk - but as the lwn summarizer admits, the video definitely conveys things that are hard to capture on text.

The core takeaway for me is to think about:

what is the future of free and open-source software? The answer was: it has no future.

This seems somehow connected to ‘democratizing programming’, which I had earlier given a talk about. Somehow, it feels like there needs to be an update / rebirth of the GNU Freedoms for the world we live in.