It’s just data

RSS needs profiling

Mark Nottingham: How would we do this for RSS? I think it would be relatively easy (and MUCH more lightweight). Get a bunch of aggregator folks (virtually) together and decide what features they're going to support - e.g., xhtml:body, how to interpret markup inside description, how to prioritize different elements that do the same thing, etc. Use Jorgen's RSS schema, and modify it to make it easy to validate a profiled feed.

I can help.  For starters, here's a regex that the rss validator uses for rfc822 + Y2K formatted dates:

^(((Mon)|(Tue)|(Wed)|(Thu)|(Fri)|(Sat)|(Sun)), *)?\d\d?
+((Jan)|(Feb)|(Mar)|(Apr)|(May)|(Jun)|(Jul)|(Aug)|(Sep)|(Oct)|(Nov)|(Dec))
+\d\d(\d\d)? +\d\d:\d\d(:\d\d)?
+(([+-]?\d\d\d\d)|(UT)|(GMT)|(EST)|(EDT)|(CST)|(CDT)|(MST)|(MDT)|(PST)|(PDT)|\w)$

I'm not keen on having more than one RSS overlord at a time and tracking yet another splinter of RSS whose value I find dubious doesn't sound like what I want to spend my free time doing.

The innovations Torsten and I see in future versions of RSS Bandit are little to do with the base RSS spec but more than likely will be influenced by extensions such as your CommentAPI or complimentary technologies and techniques like RSS autodiscovery.

Mark Nottingham can play RSS overlord if he wants but I doubt I'll be involved. More than likely I'll just use you as my personal filter and whatever good ideas to come out of your commentary or links you provide I'll implement [e.g. your links to Tim Bray's motivated me to fix how HTML encodings in descriptions are handled and the way relative links are treated by RSS Bandit]

Posted by Dare Obasanjo at

From what I can see, all that is being attempted here is to collect up the best practices and perhaps to write up a few test cases so that people who write tools don't each have to rediscover the edge cases for themselves.

I certainly plan to participate.

Posted by Sam Ruby at

It is a little-known but potentially useful fact that the RSS validator

- is open source
- is liberally licensed (Python license, no GPL)
- is downloadable here: http://feeds.archive.org/validator/download/rssvalidator-latest.zip
- contains several hundred test cases, each of which is either a valid RSS feed or is invalid in a specific way

Not that it handles all the edge cases, but it's certainly easier than starting from scratch.

Posted by Mark at

Social Software Alliance Wiki

Posted by Michael Fagan at

Pattern Restrictions on xsd:dateTime

Adding in the regexp suggested by Sam for RFC-822 format dates to the RSS 2.0 schema, I have come to the conclusion that I must be missing the point somewhere. I am using the following schema construct: <xs:element name="pubDate"...

Excerpt from TheArchitect.co.uk - Jorgen Thelin's weblog at

Add your comment