intertwingly

It’s just data

Unicode Enabled Trackbacks


I've changed my weblogging software to send trackbacks in utf-8, and to try to respect the charset, if specified, on trackbacks received.

This involved four changes.

Outbound, I changed autoping.py to encode the title and excerpt parameters:

arg['title'] = title.encode('utf-8')
arg['excerpt'] = body.encode('utf-8')

And added the content-type header, thus:

request.add_header("Content-type",
  "application/x-www-form-urlencoded; charset=utf-8")

Inbound, I changed post.py to determine the charset:

charset=cgi.parse_header(fs.headers['content-type'])[1].get('charset','utf-8')

And then made use of this charset when parsing the data:

try:
  return unicode(value,charset)
except:
  return value

I've also written a small test driver that can be used to verify that a server handles the character set correctly.

It certainly would be understandable for servers today to not respect the charset parameter, but I am curious to hear back if any outright fail to process the trackback at all if the charset parameter is present.

I also would welcome any trackbacks from server which uses a less common character set that happens to be listed in this table.