Paper notes: ‘The impact of syntax colouring on program comprehension’

I’ve recently started reading more academic papers, and thought it’d be useful to write notes about them and publish them as I go along! This one is for The impact of syntax colouring on program comprehension

  • I was amazed at the amount of prior research it is citing. Why have I not been reading these for the last 10 years of my life?
  • Apparently it is ok to report findings with a sample size of 10 people. I do not know how I feel about this.
  • The fact that there’s a large amount of thought put into the design of the experiment is quite nice, and surprisingly different from environments I’ve worked in the past where product managers designed ‘experiments’
  • To avoid datatype-related confusion, a uniform variable naming scheme was adopted in the tasks. For example, integers were named x, y, etc. and lists were named list1, list2, etc.. As someone pretty used to Python, I would have found this annoying – but I’m curious what the effect of identifier names is in program comprehension. It also reminded me I haven’t written any code in a stronger typed language in a while (I don’t think Java counts)
  • They used Solarized Color Scheme, which has a lot of fans although I’ve never been one.
  • Lots of self-reporting for ‘programming proficiency’. This is the ‘we give up!’ answer to measuring programming proficiency, I guess :)
  • We gathered data from 10 graduate computer science students at the University of Cambridge. This too seems fairly common, but I’ve no idea if such an un-diverse group of student group being studied affects the results at all?
  • They also discarded data from 3 of the students because they wore glasses and their eye-tracking hardware could not really deal with that. So this entire paper is from data from 7 students doing one particular course from one particular university.
  • We use the Shapiro-Wilk test to establish normality. We use the Wilcoxon signed rank test (WSRT) for paired nonparametric comparisons. I know some of these words!
  • As the data was not normally distributed, a 2-way ANNOVA could not be used to investigate the interaction of experience with highlighting on task times I know most of the words, but still can not make sense of this sentence.
  • Currently feeling very illiterate, but am sure this is just a feeling that will pass.
  • . The median difference in task completion time was 8.4s in favour of highlighting. To my untrained brain, that does not seem that much to me.
  • The presence of syntax highlighting significantly reduces task completion time, but the magnitude of this effect decreases as programming experience increase – this is their primary conclusion, which I can totally believe. But would I have believed it if they had come to a different conclusion? Would they have published it if it had? Would they have if there was more data? I don’t fully understand / know Academia enough to know.
  • I wonder if there has been research into richer forms of syntax highlighting – not just keyword based ones, but more contextual. Perhaps based on types (autodetected?), or scope, or usage frequency, or source, or whatever.

Overall, I enjoyed reading it – good paper! Thought provoking in some forms, but could’ve aimed higher, I suppose. I hope they continue doing good work!

DNS servers, localhost and asynchronous code

localhost is always 127.0.0.1, right? Nope, can also be ::1 if your system only has IPV6 (apparently).

Asking a DNS server for an A record for localhost should give you back 127.0.0.1 right? Nope – it varies wildly! 8.8.8.8 gives me an NXDOMAIN which means it tells you straight up THIS DOMAIN DOES NOT EXIST! Which is true, since localhost isn’t a domain. But if you ask the same thing of any dnsmasq server, it’ll tell you localhost is 127.0.0.1. Other servers vary wildly – I found one that returned an NXDOMAIN for AAAA but 127.0.0.1 for A (which is pretty wild, since NXDOMAIN makes most software treat the domain as not existing and not attempt other lookups). So localhost and DNS servers don’t mix very well.

But why is this a problem, in general? Most DNS resolution happens via gethostbyname libc call, which reads /etc/hosts properly, right? Problem there is that there is popular software that’s completely asynchronous (cough_nginx_cough) that does not use gethostbyname (since that’s synchronous) and directly queries DNS servers (asynchronously). This works perfectly well until you try to hit localhost and it tells you ‘no such thing!’.

I should probably file a bug with nginx to have them read /etc/hosts as well, and in the mean-time work around by sending 127.0.0.1 to nginx rather than localhost.

How did your thursday go?

DevLog for 21-30 Dec 2015

Clearly I missed an entire week. I need to build a better system to make this easier…

Random notes.

  • Kicked out NFS from the Tool Labs proxies (with 1). Yay! This hopefully explains the lockup of tools-proxy-01 yesterday night, maybe? It’s been restarted since, and I hope to no longer have instances randomly locoking on me. Infrastructure standards of 2009, here we come! :D I’ve also removed NFS from tools-redis, and migrated them to Jessie as well.
  • Fixed up all the races in how kubernetes workers are setup with 2
  • Another instance is ‘stuck’ again. Sigh. AAAAAAAAAAAAAAAAAAAAAAAAA. Paravoid helped debug this, tracking it down to NFS client issues in the 4.2 kernel (See phab). I moved k8s nodes back to a working 3.19 kernel (after filing issue about the other 3.19 kernel package I tried that didn’t work).
  • Moved the tools proxies to 4.2 (lol?) after finding out huge ksoftirqd spikes in them. Let’s see if that improves things
  • I split up the individual components of PAWS, and have a working nbserve in there now! Exciting times. Need to fixup nbserve to use traitlets for config
  • Big Tool Labs outage (again!). Some tool accidentally sent about 12million job requests and crashed gridengine’s underlying backing store (BerkeleyDB). Reset to a clean slate after many hours (Thanks Coren!) and mostly things are back up now. I’m reading through the Berkeley DB reference manual now.
  • Persistance failed for ores’s redises again, mostly because vm_overcommit was turned off. Fixed in the core redis module so it does not happen again.

DevLog for 2015 Dec 20

Probably going to take it easy and chill. Already sent a trivial PR up though.

Also saw the old devlog mailing list and feeling happy memories. Clearly need to bring something back like that, but I don’t know / think mailing lists are the best medium. More thinking!

Ended up learning some Tornad and wrote nbserve to serve rendered notebooks and static files in a configurable way. I should refactor PAWS to have separate jupyterhub, proxy and nbserve pods tomorrow. Also need to test for path traversal attacks and whatnot. Also coroutine based programming is a lot easier than I had originally thought! woo!

DevLog for 2015 Dec 19

  • Today looks like a day of finding, reporting and fixing bugs in WikiChatter. I had made a stupid mistake yesterday that meant not all of the Teahouse pages were being parsed, and I immediately started running into bugs. Have reported (and fixed!) two (1 and 2), and ran into another bug in mediawikiparserfromhell itself. I’m sure I’ll find more.
  • Another bug in WikiChatter! 3
  • Perils of digging through archives – you find Wikipedians giving relationship advice.
  • Should work on https://github.com/jupyter/jupyterhub/issues/17 in some form over the next few weeks.

DevLog for 2015 Dec 18

Haven’t done these in a while, let’s see if I can get this back on the wagon!

  • Discovered the WikiChatter library (thanks to @halfak!), and using that in my Teahouse analysis notebook. Far better than writing my own parser and fighting with that. Lets me get on with the actual fun stuff I wanted to do (which is the actual analysis)
  • Learning about pandas, checking out matplotlib, bokeh and wordcloud libraries to use in the analysis. Have included matplotlib and bokeh (and with it, pandas and numpy) in the default libraries list for PAWS, and also fixed permissions so users can pip install stuff themselves too.
  • For context, I’m trying to do an analysis of the English Wikipedia Teahouse questions archive, mostly as a way of showing off what PAWS makes possible.
  • Also spent a good chunk of the day regretting previous life decisions. All temporary however – nothing irreversible was done, which is wonderful. Should figure out how to reduce likeliness of similar events happening in the future.
  • Docker build times on my machine are pathetic, both because of slow network (USA! USA! USA!) and cheap laptop. Need to find a proper solution to this soon.

“Things”

I’m writing this post in an attempt to catalog the list of things I own so I can evaluate if I really need them and get rid of them.

  1. 15″ rMBP
  2. Kinesis Keyboard
  3. Apple Trackpad
  4. Moto X
  5. Kindle
  6. Nexus 7 (To be returned to the WMF)
  7. Broken Nexus 4 (To be backed up and then… something)
  8. iPod Touch
  9. Earphones (Soundmagic E10)
  10. Headphones (AudioTechnica ATH M50)
  11. Battery Pack #1
  12. Battery Pack #2
  13. MiFi (US Only)
  14. Multi USB Charger + 6 USB Cables
  15. Bluetooth Speaker (JBL Flip 2)
  16. Assorted Medication (in several loose covers, need to consolidate)
  17. Toiletries (Emergency Soap, Toothbrush, Toothpaste, Hair Gel, Shampoo, Conditioner)
  18. Velcro Rolls
  19. Box of Leaves
  20. Octopus
  21. Letter in Envelope
  22. Universal plug convertor
  23. Physical paper notebook
  24. Nailcolor
  25. Glasses + backup glasses
  26. Small Green Foldable Bag
  27. Pens
  28. Wallet with assorted currencies and cards
  29. Raspberry PI
  30. Assorted USB chargers (accumulated from various devices)
  31. Wrist Straps (left and right)
  32. Beard Trimmer

Clothes:

  1. 14 Underpants
  2. 23 Socks (not 19 pairs – I had given up on pairing socks a long long time ago)
  3. 8 T Shirts
  4. 3 pairs of cargo shorts
  5. 1 pair of jeans
  6. 3 Jackets of varying thickness
  7. 1 Down Jacket
  8. 1 Scarf
  9. 1 pair of thermal underclothes
  10. 2 Towels (1 slightly fluffy, 1 microfiber)

I’ll try and keep this list updated.

Decluttering actions:

  1. I gave away assorted USB Power chargers and Plug convertors – I have now a universal plug that should be good enough, and 3 USB power adaptors of various sizes. I should probably trim down the number of cables I have