Unicode Enabled Trackbacks
I've changed my weblogging software to send trackbacks in utf-8, and to try to respect the charset, if specified, on trackbacks received.
This involved four changes.
Outbound, I changed autoping.py to encode the title and excerpt parameters:
arg['title'] = title.encode('utf-8')
arg['excerpt'] = body.encode('utf-8')
And added the content-type header, thus:
request.add_header("Content-type",
"application/x-www-form-urlencoded; charset=utf-8")
Inbound, I changed post.py to determine the charset:
charset=cgi.parse_header(fs.headers['content-type'])[1].get('charset','utf-8')
And then made use of this charset when parsing the data:
try: return unicode(value,charset) except: return value
I've also written a small test driver that can be used to verify that a server handles the character set correctly.
It certainly would be understandable for servers today to not respect the charset parameter, but I am curious to hear back if any outright fail to process the trackback at all if the charset parameter is present.
I also would welcome any trackbacks from server which uses a less common character set that happens to be listed in this table.
i18n test
Iñtërnâtiônàlizætiøn. There, I said it. Copy-and-pasted from Firefox, looks ok in this HTML form, let's see what it does further down the line. Sam noted in comments some problems with material sent via trackback (via Planet RDF - not sure how the...Excerpt from Raw at
Anne van Kesteren : Unicode Enabled Trackbacks - I think we need a solid standard, instead of working around the bugs Trackback has...
Excerpt from HotLinks - Level 1 at