Back from an awesome vacation. Too awesome to write about, even :) Suffice it to say, England has some really pretty places.
Some Android app work, and lots of monitoring work
- Fixed bugs causing the Wikipedia Android Alpha from building properly. Now it builds properly whenever there is a new commit. Hooray! This was primarily caused by me forgetting to give it lots of RAM (8G VMEM) to execute the
mvnbuild commands (https://gerrit.wikimedia.org/r/#/c/159482/) and also not cleaning up previous
.alphasubfolders (https://gerrit.wikimedia.org/r/#/c/159481/) – this causes a chain of
.alpha.alpha.alpha.*subfolders, breaking the build.
- Added a patch to the Android alpha app itself that checks for updates every day or so and notifies you if there’s a new one. Was fairly trivial to write, although I was hoping to make it more seamless (i.e. download the apk myself and just pop it up for people to tap). It now requires 4 clicks to install it, should be able to bring it down to 2 at some point in the future if people care enough.
- Added a method to our
check_graphitecode that lets you individually check a bunch of metrics for thresholds (https://gerrit.wikimedia.org/r/#/c/159473/). This makes it much simpler to do icinga checks on a bunch of metrics that are all measuring the same thing but from different machines. BetaLabs and ToolLabs checks use this.
- Cleaned up a bunch of minor things with our
check_graphitescript. Also fucked up trying to replace all double quotes in it with single quotes for consistency – it replaced the double quotes being used inside single quotes, and caused all checks to fail. Fixed shortly by https://gerrit.wikimedia.org/r/#/c/159711/
- Added more monitoring for betalabs! Now checks for stale puppet runs (https://gerrit.wikimedia.org/r/#/c/159701/) and low space on the root partition (https://gerrit.wikimedia.org/r/#/c/159694/). All are green now, thanks to some work from bd808.
- Added monitoring for ToolLabs! Now checks for stale puppet runs, low space on root and /var, and puppet failure events (https://gerrit.wikimedia.org/r/#/c/159709/). Also checks for high sustained CPU usage (https://gerrit.wikimedia.org/r/#/c/159751/). Then spent some time (with help from scfc_de (whose nick I kept spelling as scfe_de until today)) cleaning up the puppet failures. They are all green now as well.
- Did a bunch of cleaning up work around the graphite role, removing the realm branching (https://gerrit.wikimedia.org/r/#/c/159759/). Ori says everytime realm branching code is removed, an angel gets its wings, so well done there.
Not a bad day, eh? I’ve been trying to wake up early, perhaps that is helping.