Paper notes: ‘The impact of syntax colouring on program comprehension’

I’ve recently started reading more academic papers, and thought it’d be useful to write notes about them and publish them as I go along! This one is for The impact of syntax colouring on program comprehension

I was amazed at the amount of prior research it is citing. Why have I not been reading these for the last 10 years of my life?
Apparently it is ok to report findings with a sample size of 10 people. I do not know how I feel about this.
The fact that there’s a large amount of thought put into the design of the experiment is quite nice, and surprisingly different from environments I’ve worked in the past where product managers designed ‘experiments’
To avoid datatype-related confusion, a uniform variable naming scheme was adopted in the tasks. For example, integers were named x, y, etc. and lists were named list1, list2, etc.. As someone pretty used to Python, I would have found this annoying – but I’m curious what the effect of identifier names is in program comprehension. It also reminded me I haven’t written any code in a stronger typed language in a while (I don’t think Java counts)
They used Solarized Color Scheme, which has a lot of fans although I’ve never been one.
Lots of self-reporting for ‘programming proficiency’. This is the ‘we give up!’ answer to measuring programming proficiency, I guess :)
We gathered data from 10 graduate computer science students at the University of Cambridge. This too seems fairly common, but I’ve no idea if such an un-diverse group of student group being studied affects the results at all?
They also discarded data from 3 of the students because they wore glasses and their eye-tracking hardware could not really deal with that. So this entire paper is from data from 7 students doing one particular course from one particular university.
We use the Shapiro-Wilk test to establish normality. We use the Wilcoxon signed rank test (WSRT) for paired nonparametric comparisons. I know some of these words!
As the data was not normally distributed, a 2-way ANNOVA could not be used to investigate the interaction of experience with highlighting on task times I know most of the words, but still can not make sense of this sentence.
Currently feeling very illiterate, but am sure this is just a feeling that will pass.
. The median difference in task completion time was 8.4s in favour of highlighting. To my untrained brain, that does not seem that much to me.
The presence of syntax highlighting significantly reduces task completion time, but the magnitude of this effect decreases as programming experience increase – this is their primary conclusion, which I can totally believe. But would I have believed it if they had come to a different conclusion? Would they have published it if it had? Would they have if there was more data? I don’t fully understand / know Academia enough to know.
I wonder if there has been research into richer forms of syntax highlighting – not just keyword based ones, but more contextual. Perhaps based on types (autodetected?), or scope, or usage frequency, or source, or whatever.

Overall, I enjoyed reading it – good paper! Thought provoking in some forms, but could’ve aimed higher, I suppose. I hope they continue doing good work!

Contents