Non-ASCII Characters in HTTP Headers
Contents
I was debugging an issue at work today where a (generated) file refused to download in Chrome, but the same URL worked just fine with wget. I remember reading in the HTTP Spec that HTTP headers can only be lower ASCII, so when wget mangled the output file’s name, the problem was obvious – the file name contained a character that wasn’t in lower ASCII (an accented A). Chrome had borked on encountering it, while wget soldiered on. Using iconv
to strip non-ASCII characters in the file name on the server side fixed the issue.
Moral of the story? Read the RFCs! The HTTP one, in particular, is remarkably readable and you should read it if you’re doing non-trivial Web Development.
P.S: If I had had time, I’d have went around testing this behavior in several user agents and documenting their behavior (and possibly submitting bug reports) – but
Author yuvipanda
LastMod 2011-08-13