Back from an awesome vacation. Too awesome to write about, even :) Suffice it to say, England has some really pretty places.
Some Android app work, and lots of monitoring work
- Fixed bugs causing the Wikipedia Android Alpha from building properly. Now it builds properly whenever there is a new commit. Hooray! This was primarily caused by me forgetting to give it lots of RAM (8G VMEM) to execute the
mvn build commands (https://gerrit.wikimedia.org/r/#/c/159482/) and also not cleaning up previous
.alpha subfolders (https://gerrit.wikimedia.org/r/#/c/159481/) – this causes a chain of
.alpha.alpha.alpha.* subfolders, breaking the build.
- Added a patch to the Android alpha app itself that checks for updates every day or so and notifies you if there’s a new one. Was fairly trivial to write, although I was hoping to make it more seamless (i.e. download the apk myself and just pop it up for people to tap). It now requires 4 clicks to install it, should be able to bring it down to 2 at some point in the future if people care enough.
- Added a method to our
check_graphite code that lets you individually check a bunch of metrics for thresholds (https://gerrit.wikimedia.org/r/#/c/159473/). This makes it much simpler to do icinga checks on a bunch of metrics that are all measuring the same thing but from different machines. BetaLabs and ToolLabs checks use this.
- Cleaned up a bunch of minor things with our
check_graphite script. Also fucked up trying to replace all double quotes in it with single quotes for consistency – it replaced the double quotes being used inside single quotes, and caused all checks to fail. Fixed shortly by https://gerrit.wikimedia.org/r/#/c/159711/
- Added more monitoring for betalabs! Now checks for stale puppet runs (https://gerrit.wikimedia.org/r/#/c/159701/) and low space on the root partition (https://gerrit.wikimedia.org/r/#/c/159694/). All are green now, thanks to some work from bd808.
- Added monitoring for ToolLabs! Now checks for stale puppet runs, low space on root and /var, and puppet failure events (https://gerrit.wikimedia.org/r/#/c/159709/). Also checks for high sustained CPU usage (https://gerrit.wikimedia.org/r/#/c/159751/). Then spent some time (with help from scfc_de (whose nick I kept spelling as scfe_de until today)) cleaning up the puppet failures. They are all green now as well.
- Did a bunch of cleaning up work around the graphite role, removing the realm branching (https://gerrit.wikimedia.org/r/#/c/159759/). Ori says everytime realm branching code is removed, an angel gets its wings, so well done there.
Not a bad day, eh? I’ve been trying to wake up early, perhaps that is helping.
Missed DevLogging for a while.
Am in London now.
- Started using a spare Majestouch Ninja 2 over my regular Kinesis Advantage. This is way more portable, and my hand does not seem to be hurting while using it (so far only about 4-5 hours). If this keeps up, I should be able to move to a similar smaller keyboard over the much bulkier kinesis. There’s still a little bit of discomfort, so I’ll probably want a very portable and mechanical split keyboard. Can’t sadly seem to find any, though :( Maybe I should just build one with an arduino :)
- Setup icinga checks for puppet failures and disk space issues on betalabs, and fixed a bunch of issues/docs in our icinga puppet code during that time. This still doesn’t properly work since our implementation of
check_graphite does not support wildcard metrics properly – it should check thresholds for each series, but it seems to do that only across the entire series combined, which is kinda useless. Should fix that soon by adding more features to it. Also might try out other alternatives to icinga, since our icinga puppet code is a fuckball anyway.
- Fix a couple more Quarry bugs. There’s still a random bug where
celery seems to be attempting to read data about a query run from mysql before the web has committed it, which is theoretically impossible (I do a commit before sending the task to celery with the id), so I suspect some mysql fuckery. Will need to debug that sooner than later, and also consider moving to
postgres. But then Quarry will have to deal with SQLite (for result storage), MySQL (for connecting to labsdb) and with postgres for local data, which sounds insanely complex. I also added CORS support to resultsets, and Magnus is playing with it (wooohhooooo!!!). I’m going to add more features to make it easier for people to use results from quarry in their JS applications elsewhere. Should be fun.
- Finished videos for the first week of the Coursera Data Analysis and Statistical Inference class I’m taking. Started poking around R since the labs for that class are from R, should be fun.
- Chad has started his devlog. He does search stuff at Wikimedia and is a co-whiner about all things Java. Do check out :)
Am away on ‘vacation’ till Wednesday, yay! :) Should disconnect well.
Chill weekend. Didn’t really do anything code related. Recovering from friday night party :)
Started reading Data + Design which seems quite nice. Also starting a coursera course on Data analysis and Statistical Inference on Sep 1, should be fun.
Let’s see. I’m also going to attempt to include patch links wherever possible.
Might spend some more time with Hive over the next few days – figured out an approach for using it from Python, and it should be fun to do so!
Not very code heavy
- Couple of pull requests(#1 and #2) for the Atom Autosave plugin. One adds a preference to not autosave by when you are explicitly closing a window / pane, and the other just sets the ‘enabled’ preference to default. CoffeeScript isn’t too bad either! I should consider writing more plugins (I currently use Atom for CSS/JS/Puppet, should try other languages)
- Added CSV, TSV and JSON download options to Quarry. There’s another Webinar tomorrow by the Grantmaking team, and J-Mo asked for it. Streaming TSV and CSV implemented in a neat way, will write a blog post tomorrow about it.
- Started work on a ‘number of editors’ per country metric for WPDMZ, needs to be finished up.
I feel a bit exhausted (physically and mentally) from the intense coding over the last few weeks, might have a few chill days to recharge myself. I’m growing old! :(
- Moved labsbooks (described in yesterday’s devlog) to use a shared readonly IPython virtualenv maintained by me. Also installed a bunch of modules people might want to use (SciPy, NumPy, Pandas, PyTables, matplotlib). Am considering just installing IPython notebook globally via puppet and using that, since that’ll enable users to just use the system packages. However, the version of IPython notebook from Ubuntu is ancient, so that’s probably a non-starter.
- Have a basic version of the IPython publishing process working! Any toollabs user can create a notebook by:
- Creating a
- Doing a
chmod +x ~/notebooks
- Doing a
chmod +x ~
- Putting IPython notebooks into
- Going to
https://tools.wmflabs.org/notebooks/<user-shell-name>/<path-to-ipynb-file> Will have to do a bit more work before it can be considered ‘production grade’ (such as user pages, a nicer theme, etc, etc) BUT YAY GOOD START. It already caches the html output in Redis and invalidates with the
mtime of the file, so should be pretty quick.
- Made the ssh tunneling process for
labsbooks purely python, without requiring the
ProxyCommand. This makes things simpler (and more portable!). I’ll need to work on securing this properly before I can publish it for broader use.
- Wrote an email to the analytics mailing list about making public the ‘edits per country’ data. I hope to make this publicly available with enough granularity that not just me but others can use this for fun research as well.
I’ve been using Atom for puppet stuff, PyCharm for Python and IntelliJ for Java stuff, and that seems to be doing ok. They all have decent Vim keybindings as well, and good replacements for other functionality – and I might stick with Atom for a while to see how it goes :)
Back in Glasgow! It was actually not very cold today, only cold! Progress.
Started working on my long-abandoned
labsbook project, which aims to make Tool Labs a first class environment for people who want to run (and publish) iPython Notebooks while also being able to access the replica databases and the dumps. Doing this in a secure manner is kinda hard, but I think I’ve a neat solution that lets everyone run a personal iPython kernel on the Grid, access it from their local machine, and also publish it to the web from a standard location. So far, I’ve gotten my script to a point where it’ll setup an iPython environment for you if it doesn’t already exist, start the kernel if it isn’t already running, and tunnel the editing interface back to you to use! Things left to do include:
- Open up the browser when tunnel is open
- Find a sane way to kill a kernel that hasn’t been doing anything since forever
- Setup a shared iPython environment (just code, readonly) so people don’t have to setup their own environments everytime (this is primarily a performance enhancement)
- Find a nice and simple way for iPython notebooks to be published. I’m currently thinking of an URL such as
tools.wmflabs.org/notebooks/<username>/<notebookname>' to display them, and an index attools.wmflabs.org/notebooks/`. This shouldn’t be too hard with appropriate permission munging.
I’m also using paramiko for this, which makes writing SSH related code with Python a breeze. It even supports
proxycommand! Blunders I’ve done while getting up to this point include:
jsub run.bash -mem 4G instead of
jsub -mem 4G run.bash and wondering why my script kept getting killed with OOM.
- Trying to do a
pip install on the Grid nodes (which don’t have build tools) instead of on
tools-dev and wondering why running it from the commandline works but from
jsub does not.
- Wondering why my SSH Tunnel kept dying and trying to debug that without realizing that it was dying because the iPython process was dying because it was OOMing because of my earlier
- Thinking that user accounts (rather than tool accounts) can not submit jobs to the grid, while the problem was that I had not set the
execute bit on my script.
Once this is done (I suspect tomorrow), I’ll work on getting the data from my work with WPDMZ into a form good enough for publicizing (removing ways of de-anonymization), and then use iPython notebooks to make graphs! This should be fun :)
Source so far available on Github. Needs more work / documentation / cleaning up.
Was in Edinburgh again, missed writing it as I went along. Oh well.
- Ported the code for the Wikipedia Android app automated builds to Python, and you can see it in action at http://tools.wmflabs.org/wikipedia-android-builds/ now. It lets you download the latest build, and notes the last successful build time. Good enough :) It was originally in bash, and porting it to python allowed me to create a ‘fake’ API (just JSON blobs written to known locations in the file system). Next step is to write a helper app
- The Atom experiment is coming along well. Am using it for most of my Puppet work these days. Should give LightTable another go as well.
In Edinburgh! I’ve finally stopped spelling it as Edinborough!
- Added ‘user group’ functionality to Quarry, and added a
sudo user group that does what you would think it does. Will be assigned super, super sparingly.
- Found out that I’d have to explicitly specify charset of the database when I’m creating it or MySQL will default to a stupid charset. Forced all tables and columns to utf8 and that seems to have fixed a bunch of unicode issues. Yay?
- Still facing occasional
MySQL server has gone away errors with SQLAlchemy for local MySQL instance, despite asking SQLAlchemy to recycle connections every hour or so. Reduced the recycle time to 10m, hopefully that helps.
- Read Tony Hoare’s Turing Award speech from 1980, titled “The Emperor’s Old Clothes”. I think I should read more of these papers / speeches, helps keep perspective and ‘learn from history’. Lots of warnings against complexity seem to be a very common theme, and one I’ve also personally encountered many times. Recommend reading :)
- More DMZ work! Now running edits per country stats separated by mobile vs desktop for all countries for all wikipedias! EXCITING!
Woke up at 11AM again! w000t!
- Fixed a stupid bug in Quarry that made it fail if MySQL decided that a column you were selecting is a
Decimal. Fixed this in time for…
- Helped out with a webinar from J-Mo for people to learn SQL and do research against Wikimedia stuff with Quarry! Went without a glitch mostly, so yay :) Oren asked for an API for Quarry, I’ll investigate ways to get that done
- Found that some queries against enwiki succeed on
s1.labsdb but not on
s4.labsdb. I attribute this to cache characteristics, and have switched Quarry to use s1 for now.
- Did more work on DMZ, getting in place a runner / processor infrastructure and sqlite backed intermediary storage so I can write simpler code to get the data I want. Super excited to see how this goes. It’ll be much easier for me to add new queries now.
- More work on DMZ! I had a late night sprint and split edits by mobile and desktop (and was surprised by seeing how much lesser edits came from mobile). Next would be to gather up population and internet penetration data and then GO MAKE SOME GRAPHS, probably with iPython. I’m returning to making graphs for real after, what, 6 years or so? :) I was using C# and Excel then :) /nick ExcitedPanda
- Also made some progress on our makeshift Android CI-ish build system for the Wikipedia app. Should be done soon.