Really Simple Syndication
By Sam Ruby, September 2, 2002.
From Netscape's RDF Site Summary (RSS) 0.9 official DTD, proposed:
RSS is an XML/RDF vocabulary for describing metadata about websites, and enabling the display of "channels" on the "My Netscape" website.
This document explores the basic concepts behind the various XML grammars which were derived from this base and makes suggestions as to directions for their further evolution.
Let's start this journey by looking at a humble RSS file. In fact, lets look at mine as produced by two different tools, first Radio, then blosxom. There are some differences (primarily additional elements that few render), but the essential structure is the same, i.e.:
Channels have items. Both channels and items have (optionally, it turns out) a title, link, and description.
That's pretty all you need to get up and running. Everything else is gravy. You could write one by hand, but the truth of the matter is that the overwhelming majority of RSS feeds are written by programs and for programs.
Now lets take a look at a specification. Here's the recently proposed 0.94 spec. It describes a lot more elements than I described above. Here's a summary of the ever growing number of elements defined in the core specifications. RSS 1.0 bucks the trend. Instead of an ever growing number of elements in the core specifications, extensions are provided by modules. In fact, there is even a proposed module capturing the additions made by 0.91. The roadmap for 0.94 indicates that successors to this version may consider a similar approach, but for now, additions continue to be made to the core.
The change notes for 0.94 indicate that <image> is now optional. This surprised me. Looking at my RSS feeds, including the one produced by Radio, I don't see an image element. I am not aware of any aggregator having a problem with my Radio newsfeed, so I can only conclude that this element was effectively always optional.
A slow but continuous rate of growth. Looking again at the change history for 0.94, it is clear that great pains have been made to ensure that upward compatibility has been maintained. What this means is that if you author (by hand or by program) an RSS feed precisely according to the 0.92 specification, your investment will be protected and can be consumed by any aggregator designed for 0.94.
As I said previously, RSS feeds are typically written by programs and for programs. There are a number of programs designed to consume RSS feeds. These programs are known as aggregators. Now, lets take a moment and look at compatibility from the consumer's perspective.
RSS 0.91, RSS 0.92, and RSS 0.94 do not make any claims about backwards compatibility. What this means that if you are writing an aggregator (or merely using one), there are no guarantees that your investment will be protected. However, things are not as dire as this may seem. Given the excellent record on upward compatibility, it appears that one can safely assume that the following changes can be expected: required elements may be made optional, limits will be lifted, and new elements will be added. The good news is that if there is an element that you are looking for, it's meaning won't change. Given these observations, it is possible to cope with change after a fashion. Namespaces would be better. And they may be coming, just not now. At least not in 0.94.
Now lets look at the version attribute, present at the top of, for example, RSS 0.92 feeds. Who is this data intended for? I have no research to back this up, but it would seem to me that most consumers of RSS feeds would ignore this attribute, for two reasons, both stated previously. First, one can't assume that the data that follows is valid with respect to such specifications. Second, there will in all likelihood be other versions of RSS specifications, perhaps not even written yet, that have to be dealt with by the same aggregator.
Now lets look at other differences between RSS 0.9/RSS 1.0 and RSS 0.91/RSS 0.92/RSS 0.94. Comparing the latest of each branch one ends up with the following:
- The name of the outermost element are both TLA's starting with the letter 'R'. Just different TLA's.
- Both support the essential <channel>, <item>, <title>, <link>, and <description> elements described above.
- Both also support <image> and <textInput> elements, which appear largely to be holdovers from 0.9.
- <item> elements appear within <channel> elements in 0.94, and appear alongside the <channel> in 1.0.
- 1.0 supports namespaces now, successors to 0.94 may do so in the future.
- 0.94 defines more elements in the core specification. When you include modules, 1.0 has more elements defined in total.
So lets start with a clean slate and describe what I would like to see in an RSS 2.0 if I were made king of the world for a day and were free to make whatever changes I like. Of course, if I were made king of the world for a day, I would probably devote my time to other matters, but let's not digress here too much...
Before talking about futures, it helps to establish a framework of values.
From the very beginning of my career, I've been indoctrinated into the importance of backwards compatibility. Not just for producers, but also for consumers. As king, I would ensure that the next spec explicitly recognizes the importance of this from this point on.
Simplicity. I really L*O*V*E the new name for RSS 0.94. Really Simple Syndication. Unfortunately, this spec attempts to live up to this new name by adding still more attributes to the core, albeit optional ones.
Extensibility as described in the RSS 1.0 design goals, and affirmed by the RSS 0.94 roadmap, developers should be able to add modules without interfering with each others work. So this one no longer appears to be controversial.
For starters, I would like to see a return to simplicity. Remove from the core all elements except <channel>, <item>, <title>, <link> and <description>. And make every one of them optional except channel. This means that image and textInput would be placed into a "mod_rss09" module.
Then add in an 0.91 module, with a key difference. I'd like to see UserLand host the document and have the RSS 2.0 modules list reflect this. This means that every recipient of document containing these elements would provide attribution to UserLand, as well as having the URL where they can find the human readable description for any such elements. This should be repeated for 0.92 and 0.94. Simon Fell can host his own description for 0.93.
Of course, every 1.0 module would be a valid 2.0 module.
As to whether the items should be in or out of the channel, or the name of the outermost element which acts as a container... I would leave such important decisions to day two.
I actually don't want to slow down or derail the current 0.94 work. Let a thousand flowers bloom and all that. But it is helpful occasionally to revisit first principles. In this case, can every feature of 0.92 justify itself? If so, great, otherwise, perhaps at some point in the future it might be worth streamlining the core spec.
Meanwhile, RSS has grown considerably from it's original humble beginnings as a "site summary" to a syndication format that enable people to communicate with people without significant investment in infrastructure and across both time and platform boundaries. Everyone involved, particularly Netscape, UserLand, and the RSS-DEV working group deserve our gratitude.