It’s just data

Unicode Enabled Trackbacks

I've changed my weblogging software to send trackbacks in utf-8, and to try to respect the charset, if specified, on trackbacks received.

This involved four changes.

Outbound, I changed to encode the title and excerpt parameters:

arg['title'] = title.encode('utf-8')
arg['excerpt'] = body.encode('utf-8')

And added the content-type header, thus:

  "application/x-www-form-urlencoded; charset=utf-8")

Inbound, I changed to determine the charset:


And then made use of this charset when parsing the data:

  return unicode(value,charset)
  return value

I've also written a small test driver that can be used to verify that a server handles the character set correctly.

It certainly would be understandable for servers today to not respect the charset parameter, but I am curious to hear back if any outright fail to process the trackback at all if the charset parameter is present.

I also would welcome any trackbacks from server which uses a less common character set that happens to be listed in this table.