It’s just data

Producing Well Formed XML with Rails

I started out taking a look at how I could robustly handle i18n in my Rails Weblog implementation, and ended up in a completely different place - ensuring that Weblog produced well formed XML.

As described previously, atom.rxml uses Ruby’s XML builder.  I was going to look into enhancing the escaping function to handle utf-8, iso-8859-1, and windows-1252 for both element and attribute values when I noticed that escaping was only done on element values.

Perhaps this is best explained by example:

code output
@xml.title('1<2') <title>1&lt;2</title>
@xml.title('AT&T') <title>AT&amp;T</title>
@xml.title('&amp;') <title>&amp;amp;</title>
@xml.a(:title => '1<2') <a title="1<2"/>
@xml.a(:title => 'AT&T') <a title="AT&T"/>
@xml.a(:title => '"x"') <a title=""x""/>

This is either a case of “everybody knows” that the XML builder expects pre-escaped attribute values, or an oversight.  If the former, then I expect a lot of people who build podcast feeds to produce XML that is not well formed if any of the URIs contain multiple query parameters.

If it is indeed an oversight, then it is one that is easily correctable, even locally, given that classes tend to be “open” (i.e., modifiable) at runtime.

Here’s an attr_escape_fix.rb with a few tests.