Something to think about the next time you are tempted to think
that you can get
queries
for free. While both
HTTP
and
XML
provide mechanisms for defining encoding, support in widely
deployed implementations is much better in XML than in straight
HTTP.
URI's seem to be
converging
on UTF-8, albeit at an excruciatingly slow pace. Don't
leave this to chance - if you are defining a
GenerativeNaming scheme today, make this explicit.
If you are defining a protocol based on HTTP POST, encourage the
use of the charset parameter on the Content-Type header.
Require it if you can.
Anne van Kesteren : Trackbacks, Queries, and Encoding - I wonder if pingback solves this...
I wonder how that here... My link log doesn't even send pings out. I just read the Pingback specification [link] and it seems that isn't the optimal solution as well.
From what I have heard, trackback is really bad. It can invalidate weblogs [link] it can't handle encoding as well, someone should come up with a new format that addresses those aspects (encoding, validating, excerpt) and of course, it might be nice if it is in some way compatible with trackback so that people can easily implement support for it.
I have sent numerous requests to web site owners in Korea to define the charset (including when publishing pages in English, since some Latin chars in Korean fonts are not correctly displayed in iso- and utf), to no avail. They seem to think that because their target market is Korea and Koreans, their browsers will be set to display Korean by default, so no problem... :-(
dda, I've actually had some success in the past persuading web site owners in Japan to declare a charset, which is gratifying. Meanwhile, if it makes you feel any better, at least one (presumably) Korean site owner is pursuing the issue of Trackbacks and character encoding that Sam raised here, though Six Apart hasn't replied yet.
Jacques Distler: It turns out that by design it is rather hard for a string of bytes to be valid utf-8, unless that string is pure US-ASCII, in which case it doesn't much matter which encoding you presume....
[more]