Overriding xml:base

2007-01-22T13:02:26Z

It looks like Don Park has been doing some interesting things with images, providing downloads, and linking to examples to look at.

Unfortunately, the way he has constructed his feeds tends to makes these things difficult to access by people who read his site through a feed reader.

Feed Reader

RSS 2.0 Bloglines GoogleReader

Atom 1.0 Bloglines GoogleReader

Feed	Reader
RSS 2.0	Bloglines	GoogleReader
Atom 1.0	Bloglines	GoogleReader

With Atom, the fix would be as simple as adding xml:base="http://www.docuverse.com/blog/" on the feed element itself. With RSS 2.0, the fix required would be somewhat more involved.

My prior experience with Don is that he expects others that are inclined to do so to work around problems that he creates by ignoring the various specifications, so accordingly I am testing out a fix for Planet Venus, allowing xml_base to be overridden on a per-feed level.

The nature of the fix is fairly invasive: by default the Universal Feed Parser will take care of a number of sanitation and resolving of relative URI details. I’ve modified the parser to allow these features to be wired off. Venus will then later go back — after possibly adjusting a number of elements in the feed — and call back into the same internal routines that the Feed Parser uses itself to resolve relative URIs and sanitize HTML.

Given this, I’m testing this out locally on my setup first. If it seems stable, I’ll push it out for others to use.

Update: Don has fixed his Atom 1.0 feed, and both patched a single entry in his RSS 2.0 feed, and hopefully set things up so that this particular problem with his RSS 2.0 feed will likely not reoccur. Meanwhile, the fix looks stable and if this keeps up, I will push it out later this afternoon — splitting the sanitation logic out makes it easier for me to make progress towards replacing sgmllib with html5lib.