It’s just data

DocGems

A Blogger Code of UnProfessional Ethics and Wetware, the Killer App.  Both are must reads.

P.S.  Doc's RSS feed is not valid XML.  Something about a reference to an undefined entity 'ouml'.


Dave had briefly posted a request for an RSS feed for Doc Searls and I spent about an hour creating some XSLT to get Doc's OPML file into RSS format. By the time I was done so was Dave.

Anyway, I'm looking at this now. This looks like a case where encoding HTML markup comes back to bite.

To get the XSLT processor to handle this, the input has to be proper XML. To get proper XML, I have to decode the entities in the opml element's "text" attribute. But some of those entities are < and >.

So, if I leave it as it is, it isn't valid xml because of the undefined "ouml" entity. If I decode all the entities, it isn't valid XML because of the angle brackets.

So, now I'll go through the string initially, look for < or >, encode that as < or > and then decode the entities later. An extra two lines of Perl (though I could make it one, if I really, really wanted to).

Posted by Mark A. Hershberger at

Oh, and don't forget the " elements, either.

Posted by Mark A. Hershberger at

Re: "though I could make it one if I really really wanted to". I think you just summed up the essence of Perl.

Posted by Mark at

Add your comment