It’s just data

Overriding xml:base

It looks like Don Park has been doing some interesting things with images, providing downloads, and linking to examples to look at.

Unfortunately, the way he has constructed his feeds tends to makes these things difficult to access by people who read his site through a feed reader.

Feed Reader
RSS 2.0 Bloglines GoogleReader
Atom 1.0 Bloglines GoogleReader

With Atom, the fix would be as simple as adding xml:base="http://www.docuverse.com/blog/" on the feed element itself.  With RSS 2.0, the fix required would be somewhat more involved.

My prior experience with Don is that he expects others that are inclined to do so to work around problems that he creates by ignoring the various specifications, so accordingly I am testing out a fix for Planet Venus, allowing xml_base to be overridden on a per-feed level.

The nature of the fix is fairly invasive: by default the Universal Feed Parser will take care of a number of sanitation and resolving of relative URI details.  I’ve modified the parser to allow these features to be wired off.  Venus will then later go back — after possibly adjusting a number of elements in the feed — and call back into the same internal routines that the Feed Parser uses itself to resolve relative URIs and sanitize HTML.

Given this, I’m testing this out locally on my setup first.  If it seems stable, I’ll push it out for others to use.

Update: Don has fixed his Atom 1.0 feed, and both patched a single entry in his RSS 2.0 feed, and hopefully set things up so that this particular problem with his RSS 2.0 feed will likely not reoccur.  Meanwhile, the fix looks stable and if this keeps up, I will push it out later this afternoon — splitting the sanitation logic out makes it easier for me to make progress towards replacing sgmllib with html5lib.


Ouch!

When I replied “I am more incline to conclude that Bloglines don’t consider the problem significant enough to fix. If it was, they would have fixed it by now”, I was talking about the subject in general and not for just my own problems. You were talking about a common RSS problem so my answer should be understood in that context which is this:

“people are just not using problematic characters in post titles frequently enough to make inconsistent title handling among RSS readers a showstopper.”

Sam, I respect your opinions, even when I disagree, but I think you tend to use other people’s mishaps as opportunity to pick bones over. I don’t mind being at the receiving end of it but, frankly, it doesn’t fit my image of you, a benevolent leader in the open source community, very well.

BTW, base URL problem was caused by TinyMCE which converts absolute URLs to relative URLs by default. Fixed.

Posted by Don Park at

s/tend to/sometime/

It just felt that way to me. Also, I would like to apologize for making it easy for you to misunderstand me. I have a nasty habit of saying things with multiple-meanings to see which interpretation the listener would choose, mainly because I believe the choice depends on the listener’s disposition.

Posted by Don Park at

I still object to characterizing the “problem characters in RSS 2.0 titles” issue to be something that the Bloglines team have not chosen to focus on.  On the contrary, I have ample evidence that they have spent a fair amount of time on this issue, but the problem is that the bug is not in their code.  Instead, the bug is in the RSS 2.0 specification.  And the Bloglines folks are doing the best job they can with what they have got.

By contrast, I believe that intelligent folks who produce feeds should be aware of these issues and instead of trying to cast aspersions on the innocent should instead take efforts to avoid the problem.

As to the image that you have created for me, perhaps you need to reassess.

Posted by Sam Ruby at

Sam, it’s true that I don’t know why Bloglines haven’t addressed the problem so I’ll take back that characterization if that means something to you.

As to whether they are doing the best job they can with what they got, I am willing to explore the problem with them to see if there is anything I can help them with to fix the problem.

Posted by Don Park at

I am willing to explore the problem with them

The bug is not in their code.

Posted by Sam Ruby at

Sam, if you want to promote Atom, just build killer apps that require Atom’s features to implement. Trashing RSS won’t stop people from using it.

Posted by Don Park at

Trashing?

My position is simple.  If you are comfortable living within the limitations of what the RSS 2.0 feed format supports interoperably, then by all means do so.  If those limitations are too confining, then don’t blame somebody else for your choice to use a format for something it was not designed to do.

Meanwhile, I’ve just pushed out the update to Venus that you inspired.  Venus is top-to-bottom designed around Atom 1.0, XHTML, and utf-8, but will consume and correct just about anything based on the best of breed libraries that are out there, and augmented by configuration information that the user provides.

It has extensive documentation.  You might find the architecture and normalization sections interesting.

Posted by Sam Ruby at

Yes, interesting stuff indeed.

Speaking of inspiring, your publishing of commenter’s IP address inspired me to apply identicon to IP address so we have an odd case of mutual inspiration here. Pardon me for trying to squeeze what humor I can out of this rather awkward situation.

Posted by Don Park at

just build killer apps that require Atom’s features to implement

An unambiguous content model is a killer feature.  You’re the only one here who doesn’t seem to realize that.

Posted by Mark at

On the subject of broken feeds, did you get my email from a couple of days ago Sam? Your comment feed is still not working for me. Actually, having just checked, your main feed seems to have the same problem. I don’t know if it’s you, me, or some crazy proxy in the middle, but something isn’t working right.

Posted by James Holderness at

Monday, January 22, 2007

Getting Visitor Ownership in a Cluttered Internet World Tags: rss web20 All Good Things… « François Schiettecatte’s Blog Tags: feedster rss web20 Sam Ruby: Overriding xml:base Tags: atom rss web20 xml Egress - RSS Reader for the PocketPC Tags:...

Excerpt from The RSS Blog at

Don:

I am willing to explore the problem with them to see if there is anything I can help them with to fix the problem.

You will find they cannot. But you are welcome to experience the exciting world of feed format geeks for yourself. Be warned though; you might not end up in the camp you expected. Just look at Rogers Cadenhead and James Robertson…

Sam:

y’know, <cite> and <blockquote cite> would be nice additions to your markup whitelist…

Posted by Aristotle Pagaltzis at

did you get my email from a couple of days ago Sam? Your comment feed is still not working for me.

Found it.  I’m not aware of any change on my end, but for now, I’ll disable RFC 3229 support.

y’know, <cite> and <blockquote cite> would be nice additions to your markup whitelist…

Why?  :-)

Posted by Sam Ruby at

Umm Sam is there supposed to be a xml:base on your feed element? It’s just I use Sage in Firefox to read my feeds and your links never work.

(Must get round to getting OpenID set up sigh).

Posted by Simon Proctor at

Simon: Sage has a bug.  The default for xml:base in effect in the absence of any explicit xml:base is the URI used to retrieve the document itself.  Bloglines and Google Reader get it right, as does Venus.

Posted by Sam Ruby at

I figured that was the case, it’s a shame because I do like using it. Funnily enough Firefox itself does it properly with live bookmarks.

Maybe I’ll go look at Google Reader again. Thanks.

Posted by Simon at

Add your comment