Yuvi Panda

JupyterHub | MyBinder | Kubernetes | Open Culture

PHP being insane, part 5832

Somehow: ( $a && $b ) || ( $b && $c ) || ( $a && $c )

Became: $a ? ( $b || $c ) : ( $b && $c )

Became: count( array_filter( array( $a, $b, $c ) ) ) >= 2

Became: "$a$b$c" > 1

God dammit PHP…

(from discussion among me, ^d, ori-l, bd808 and anomie on #mediawiki-core about how to represent ‘if at least 2 of three conditions are true’)

DevLog for Sun, Aug 25, 2013

DevLogs have been something I’ve not been writing much of of late. Time to fix that!

WLM Android App

Spent some time reviving the WLM Android App. Wasn’t too hard, and am surprised at how well it still runs :) Some work still needed to update the templates and other metadata to refer to WLM2013 rather than WLM2012 – but that should not be too hard. The fact that it is an issue at all is simply because I ripped out all the Campaign related APIs a few weeks ago with my UploadCampaign rewrite.

multichill was awesome in moving the Monuments API to Tool Labs – hence making it much faster! Initially we thought that the Toollabs DB was too slow for writes – but this turned out to be a mistake, since apparently the Replica Databases had slow writes, but tools-db itself was fine. There’s a bug tracking this now. Toollabs version of the API still seems much faster to me than Toolserver’s :)

UploadCampaigns API

Mediawiki sucks. Eeeew! Specifically, writing API modules – why can’t we just be happy and have everything be JSON? Sigh!

I’m adding a patch that allows UploadCampaigns to be queried selectively, rather than just via the normal page APIs. Right now, this only lets us filter by enabled status – but in the future, this should be able to also filter on a vast array of other properties. Properties about Geographic location come to mind as the most useful. That patch still has a good way to go before it can be merged (continue support being the foremost one), but it is getting there :)

The ickiest part of the patch is perhaps that it sends out raw JSON data as a… string. So no matter which format you are using on your client, you need to use a JSON parser to deal with the Campaigns data. This sortof makes sense, since that is how the data is stored anyway. Doesn’t make it any less icky, though!

Not bad for a lazy Sunday, eh?

Update: After not being able to sleep, I also submitted a patch to make phpcs pass for UploadWizard, and also fought with the UploadCampaigns API patch to have it (brokenly?) support continuing. Yay?

Die ’80 cols or die!’ guidelines!

Python’s PEP8 has just been changed to no longer recommend sticking to 79 columns! The new text says:

Aim to limit all lines to a maximum of 79 characters, but up to 99 characters is acceptable when it improves readability.

It would be nice to not have any such set limits at all, and just depend on programmers not being insane, but this is still an improvement!

Sprinkling some Douchebaginess in code

After being frustrated at Java’s lack of a generic ‘callback’ type, I created this interface:

    public interface ContributionUploadProgress {
        void onUploadStarted(Contribution contribution);
        boolean isJavaAPieceOfShit();
    }

And randomly throw around (with onComplete implementing ContributionUploadProgress)

    assert onComplete.isJavaAPieceOfShit();

This, of course, is trivial to fix with an IDE. Should be more fun with a dynamic language :)

(And yes, I removed that code before committing)

Game of Life in APL

Ran across a video explaining building Game of Life in APL. Pretty awesome, and surprisingly – quite understandable too!

I should learn J at some point, I think. My problem with trying to learn such languages is that I find it hard to find something to build that’s not a mere academic exercise, and I still do not have a solution as such. Just ‘try harder’ doesn’t really work. Perhaps invent some sort of wonderful number crunching idea and then implement it?

Bikeshedding on Gerrit #1

First there was this.

I was bored, and so there was this. That of course, brought a swift end to the bikeshed. I’m only slightly ashamed.

Attempting to Secure Redis in a Multi Tenant environment

When running Redis in a shared cluster/hosting environment (such as Wikimedia Tool Labs, on which I’ve been having fun doing a lot of work on), you would want to try to provide at least some guarantee of isolation for your keys from everyone else’s keys. Since Redis doesn’t do ACLs, this is problematic.

This can be solved in a couple of ways

Run a Redis instance for each user

This is simple enough to do – each user runs their own Redis instance, and has full access to it. Security is handled by setting a secret password, and running redis-server as the user in question. Boom, secure!

This doesn’t really scale with a large number of users, because they each have lesser memory to work with now. Also having users who just want to run their tools have to deal with making sure their Redis instance is up and running fine isn’t really good. Having the sysadmins be responsible for users’ Redis instance is… not going to work :) This would also require all the redis instances to run on one box and/or have a separate cluster just for them, which isn’t good either.

Add ACL support to Redis

Not happening, because I’m not good enough to do that yet :P But more realistically, it won’t ever happen, since this will probably add a lot of overhead for what is arguably an edge case.

Build a small server that sits in front of Redis

Such a server would simply authenticate incoming requests via some mechanism (Keystone perhaps), and then enforce ACLs. It will have to speak the exact same protocol as Redis, since users should be able to use any library that connects to Redis. This isn’t too hard – just replace the password functionality of Redis protocol to take in both a username and password (or token, or some other method of auth).

This also I discarded – it will still affect performance, which will now be limited by how fast this server runs, and that is definitely not faster than Redis. And it will also be hard to maintain, since I’ll have to completely mimic Redis’ protocol and make sure it is kept up to date. Debugging protocol issues with random client libraries is not my idea of fun. Another major disadvantage is that I would now be writing auth code, and I don’t think handrolling auth code is a good idea, ever.

Security By Obscurity!

This is what we finally settled on :D It sounds horrible by the title, but I think it is Good EnoughTM.

Since Redis is a key value store at heart, you can do anything once you know the key. So, if an ‘attacker’ doesn’t know the key, there isn’t much they can do. So it can be considered SecureEnough for our purposes if we can make it so that other users can not find out or guess your keys.

We essentially did so with the following:

  1. Disable all Redis commands that let users list all / many keys.
  2. Have users use a random and long key prefix for all their keys.

(1) prevents someone from just listing all keys to find something interesting. (2) prevents people from brute-forcing or guessing keys. Since all code run on Tool Labs must be open source, guessing keys is super easy. By having a ‘secret’ prefix, having the actual keys is useless. This also prevents accidental key overwrites from different tools using a common key name.

Disabling commands is easy to do by using Redis’ RENAME COMMAND config feature. I added support for RENAME COMMAND to Wikimedia’s Redis puppet module, and then it was simple enough to configure a specific instance to disable ‘list keys’ type commands. That’s the following commands:

  1. CONFIG
  2. FLUSHALL
  3. FLUSHDB
  4. KEYS
  5. SHUTDOWN
  6. SLAVEOF
  7. CLIENT
  8. RANDOMKEY
  9. DEBUG.

After going through the list of Redis commands, I am guessing this is going to be GoodEnough to prevent key listing. (Note: if there’s more that I’m missing, please, please let me know).

We also tell people to use a secure prefix that’s at least 64bytes long, saved in a file that is only user readable. Generating that is as simple as:

openssl rand -base64 64

That should be long enough to be hard to brute force, even with Redis being as fast as it is.

Problems

The major problem with this is, of course – the fact that humans are involved :) I’ve heard “I do not care about my keys, do not need security” a fair amount of times already. The fact that the prefix generation is optional means that there will be people who do not use prefixes, and it will work for them for a (probably) long time – until it doesn’t, and they have no idea why. This is personally acceptable to me, since they have been made aware of the risks beforehand.

Fun

This has now been deployed on toollabs for a month or so, and I’ve a couple of fun tools already written using it (and other people too). We had a patched memcached server we had that we’ll kill in a few weeks, so people who used memcached before are also migrating to redis. And I was able to do all this without even having root! This is mostly thanks to the fact that we try to keep all our configuration in puppet (Wikimedia’s Puppet repository) – for both our production cluster and for everything else. So I could re-use our production redis module, make changes to it, and build the new solution – all while being vetted by ‘proper’ ops people (whom I dearly love and respect). Building infrastructure in such a collaborative manner is a lot of fun, and I think I’m hooked. It’s fun!

Autosuggest for Tamil in Android Keyboard

Screenshot of Tamil autocomplete

Screenshot of Autosuggest working in Tamil on my variant of the Android keyboard. I’ve been working on this on and off for the last few months (this is also my final year project in college). I’ve ported all of the jQuery.IME languages to work on the keyboard natively.

Now experimenting with Autosuggest. It only needs an appropriate dictionary now and we’ll be good to go. Currently I’m getting Autosuggest working only on Tamil, but the plan is to get it to work for all Indian languages. The method I am using (convert everything back to latin chars, then do autocorrect / autosuggest) means it should be trivial to extend to any transliteration based input method.

Wheeee! This is very much a simple ‘celebratory’ post. Will blog actual details soon.