Mark
Nottingham: How would we do this for RSS? I think it would
be relatively easy (and MUCH more lightweight). Get a bunch of
aggregator folks (virtually) together and decide what features
they're going to support - e.g., xhtml:body, how to interpret
markup inside description, how to prioritize different elements
that do the same thing, etc. Use
Jorgen's RSS schema, and modify it to make it easy to validate
a profiled feed.
I can help. For starters, here's a regex that the
rss validator
uses for rfc822 + Y2K formatted dates:
I'm not keen on having more than one RSS overlord at a time and tracking yet another splinter of RSS whose value I find dubious doesn't sound like what I want to spend my free time doing.
The innovations Torsten and I see in future versions of RSS Bandit are little to do with the base RSS spec but more than likely will be influenced by extensions such as your CommentAPI or complimentary technologies and techniques like RSS autodiscovery.
Mark Nottingham can play RSS overlord if he wants but I doubt I'll be involved. More than likely I'll just use you as my personal filter and whatever good ideas to come out of your commentary or links you provide I'll implement [e.g. your links to Tim Bray's motivated me to fix how HTML encodings in descriptions are handled and the way relative links are treated by RSS Bandit]
From what I can see, all that is being attempted here is to collect up the best practices and perhaps to write up a few test cases so that people who write tools don't each have to rediscover the edge cases for themselves.
It is a little-known but potentially useful fact that the RSS validator
- is open source
- is liberally licensed (Python license, no GPL)
- is downloadable here: http://feeds.archive.org/validator/download/rssvalidator-latest.zip
- contains several hundred test cases, each of which is either a valid RSS feed or is invalid in a specific way
Not that it handles all the edge cases, but it's certainly easier than starting from scratch.
Adding in the regexp suggested by Sam for RFC-822 format dates to the RSS 2.0 schema, I have come to the conclusion that I must be missing the point somewhere. I am using the following schema construct: <xs:element name="pubDate"...