Dare
Obasanjo: So this morning I decided to write an RSS News
aggregator.
My advice is to test it on
Joe's and
Shelley's
feeds. This requires two simple, albeit a bit unconventional,
rules: anything in the namespace of the DocumentElement is
equivalent to the null namespace, and items can be either inside or
outside of channels.
And then there are synonyms, e.g., dc:subject vs category...
I've got to work on making my RSS more difficult to parse. My XHTML and FOAF are both locally famous for their ability to break parsers, but so far nobody has complimented my ability to write hard to parse RSS.
If you're making a general list of test feeds, I'd add one that CDATA escapes description, too: I noticed last night that Feedreader displays the CDATA close tag (much better than when I first saw CDATA used, when pretty much everything either broke or refused to read it). Though Dare's no doubt using a real parser and won't ever notice the difference.
Phil, Tim Appnel's feed would be a good example of that.
However, not all is lost. Your RSS 2.0 feed (and my RSS 2.0 feed for that matter) are good examples as to why one can't ignore namespaces entirely. ;-)
Dunno if this is something we should be compensating for. Are XML documents allowed to have blank lines before the initial XML processing instruction? I was under the impression this had to be the absolute first thing in the document. But what do I know?
Tim's feed just broke my aggregator in two ways. :)
I assumed the RSS version number was supposed to actually be limited to being a numeric value which doesn't seem to be the case since his feed validates fine.
His using an unexpected namespace for the RSS elements was also a breaker.
Dare, Tim's feed validates because the RSS validator expects the RSS elements to be in the namespace of the DocumentElement. Given the history of RSS and namespaces, this seems like a most sane approach.
Damn, you're right. It's a bug with rssfinder.py. It's not finding the RSS feed, or rather, it thinks the home page is the RSS feed. Probably because of the presence of the damn Trackback data, although I swear I fixed that bug months ago. *sigh* I don't have time to debug it. Just ignore my previous comments for now.
Dare, we don't bother validating version numbers. You could publish an RSS feed version="3.141592653589793/and/your/mother's/ugly" and it would validate. In fact, I think I'm going to go do that.
On an unrelated note, it is truly scary that I still know pi to 15 digits after all these years. That was as far as my calculator would show me in high school, and (being the bored genius in the back of the room in the days before Internet access) I spent my time playing with my calculator and memorizing stupid shit. Did you know that 16435934 in hexadecimal spells FACADE? I always thought that was insanely cool.
Mark: once I fixed my whitespace before the XML declaration issue (a PHP include that I didn't actually need anymore that was outputting a blank line), we're back to your old friend, CDATA escaped Javascript. I take the CDATA out, the validator autodiscovers me, put it back in and it doesn't. I thought you fixed <em>that</em> bug months ago.
Sam: Tim's feed is a great parser-breaker, but since lots of things still don't do anything with content:encoded, for completeness we need someone who CDATA's HTML in description. Extra credit if they talk about HTML, so that they've also got entity-encoded stuff inside the CDATA section ;)
RSS Bandit. Dare Obasanjo: So this morning I decided to write an RSS News aggregator. My advice is to test it on Joe's and Shelley's feeds. This requires two simple, albeit a bit unconventional, rules: anything in the namespace of the...