It’s just data

Don Box's RSS profile

Don Box: Here's my suggestion for where to start. I won't even consider calling this a profile until consensus is reached over on Sam's site (he has comments, neither Dave nor I do)

Don, please update the last line of your proposal to point to this blog entry.  I'll comment on it below.


Don, I added dc:creator to items in my comments feed in support of NewsGator users.

The reason why I chose dc:creator instead of author is that I don't always have a valid email address.

Posted by Sam Ruby at

Profile RSS

Don,
Here are my opinions on your profile

1.) /rss/channel/item/title
  - sounds fair

2.)  /rss/channel/description
  - making it mandatory if no title is a good idea. The fact that both title and description may be missing is very irritating. I don't see why you think the description should have the same size restrictions as the title.

3.)  /rss/channel/item/pubDate
  - I hate RFC 822 dates. I suspect the .NET Framework's System.DateTime class doesn't support a number of the variants. However since this is only subsetting RSS 2.0 I see no reason to not mention it although I'd prefer it deprecated and replaced with dc:date

 
4.)  /rss/channel/item/guid
  - this element is a stupid hack that should die an ignoble death.
 
5.)  /rss/channel/item/link
  - this is where the profile goes off the deep end. Not only do you imbue the less commonly used semantics on an existing element [links to citation instead of site feed originates from] but also add a new mechanism for resolving relative links (xml:base) which again isn't directly supported on my primary platform of choice. If you want to see how to resolve relative URIs read Tim Bray's post at http://www.tbray.org/ongoing/When/200x/2003/04/22/RSS-Problems
 
 
5.)  /rss/channel/item/comments
  - more xml:base
 
6.)  /rss/channel/item/category
  - What is meant by repeating xs:string? Do you mean category's content model is

element category { xs:string* }

or we will have 

  element category(*)
 
inside an item?

7.)  /rss/channel/item/author
  - don't know what the common format for this is.

8.)  /rss/channel/item/xhtml:body
  - what happened to content:encoded which is a lot more popular than xhtml:body ?

Posted by Dare Obasanjo at

Sam, +1 on dc:creator vs. author.

To reduce churn, I wont apply any deltas to my profile until more traffic happens here.

Posted by Don Box at

The Netscape RSS 0.91 spec specified 100 characters for title and 500 characters for description.

Posted by Sam Ruby at

Dare,

Your comment on link seems odd.

HTML has the "a" element for hyper-linking. That's the established norm.

Dave coined the "link" element for RSS 2.0 - we're only clarifying it's use in that context.

Posted by Don Box at

The 100/500 limits make sense. +1

Posted by Don Box at

Here are my thoughts on Don's profile:

/rss/channel/description - I assume you mean /rss/channel/item/description here.  I'd remove the "intended as excerpt only" - a huge number of feeds provide complete content here, and it'd be a shame to make a usage profile that says they're all out of date.  Then we would remove the 250 character thing as well.  I don't see anything wrong with providing (escaped) content here, although it's not ideal.

/rss/channel/item/pubDate - I agree with Dare, I hate 'em, you gotta write a bunch of code to parse 'em, but nevertheless I support this recommendation.  dc:date is fine too - I don't think there's a good reason for preference.  I'd recommend that the profile recommend GMT, however, since IIRC RFC822 only explicitly supports North American time zones, so users on other continents could not meet the recommendation.

guid and link - not sure I agree that @isPermaLink MUST be true, and that link should only be used to point to citation, but I'll reserve judgement until I see more discussion from others.

category - I agree here.  Dare, what this is saying is there can be multiple <category> elements within an item.

author vs. dc:creator - I agree with Sam, we can't always have an email; however, if we go with dc:creator, we should publish the recommended format if one does have an email address.

Posted by Greg Reinacker at

Hmm, on my pubDate comment above, I was referring to RFC822 dates (with the hate 'em, gotta parse 'em comment), not dates in general!  :-)

Posted by Greg Reinacker at

I agree with everyone on this thread about link/guid. I'd be happy to just have link PROVIDED we tightened up the citation-vs-permalink application of it.

Also, the need for an explicit citation link seems unneccessary given the ability to scrape out "a" tags from the content.

Posted by Don Box at

I too have no affection for RFC822 dates, although parsing them is trivial in .NET and Java AFAIK.

I'd be happy to drop pubDate for something ISO-based (e.g., dc:date).

Posted by Don Box at

Don,
HTML has a link element but that is neither here nor there. There established norm in the RSS world is for <link> to point to the original post. In fact, I can only think of one blog (http://radio.weblogs.com/0117167/) where <link> points to the cite instead of links to the original post and it is extremely irritating.

OK, I just looked it seems I have to go tweak my code to prioritize <guid> over <link> which should get rid of my irritation at Sean & Scott.

Anyway, that doesn't change the fact that blogs liek that are the exception not the norm.

Posted by Dare Obasanjo at

Greg, do you buy the 100/500 balance Sam pointed out?

Posted by Don Box at

Dare, see http://www.intertwingly.net/blog/1394.html#c1052616173

In short, +1.

Posted by Don Box at

Sam, the 100/500 limit on title and description is ok as long as it remains SHOULD and doesn't become MUST.

Dare, +1 on content:encoded over xhtml:body.
Sam, +1 on dc:creator over author.
Don, +1 on dc:date over pubDate.

Please correct me if I am wrong, but everyone seems to agree that /rss/channel/item/link should be the perma-link for the item. If that is the case, then why take a relatively new element 'guid' and make it mandatory? Instead, if everyone agrees that link is a perma-link then just make it mandatory, and instead of making guid mandatory, just drop it.

Posted by joe at

Here are my comments:

- I agree with the guid/link comments, I'd remove explicit citation per Don's comment. 

- Not to introduce a tangential discussion, but if citations are really needed, shouldn't there be an ability to cite multiple links per item?  Again, this probably isn't needed in the first place.

- dc:date makes sense to me.

Posted by randy at

Why have a limitation on the length of titles in the profile if it is merely a SHOULD?

IMHO, the feeds will still be valid RSS 2.0 if they have titles on arbitrary length, but they shouldn't be considered conformant to the profile if they have excessively long titles.

Posted by Sam Ruby at

Sam,
  Good point, it will still be valid RSS and making it a MUST will make the profile clearer.

Posted by joe at

I'll buy the 100 characters for title, but I'd still resist a 500 character limit for description - again, tons of tools generate long descriptions as their only content right now...and I'd hate for them to suddenly be out-of-profile when they've been doing it for ages.

And RFC822 dates are trivial to parse in .NET, except when they use time zones like EST or EDT...then you're on your own. :-(

Posted by Greg Reinacker at

"And RFC822 dates are trivial to parse in .NET, except when they use time zones like EST or EDT...then you're on your own. :-("

Another who has felt my pain. I'm still of two minds whether to create my own date parsing class to support the many variants of RFC 822 or whether the right approach is to encourage producers to use dc:date.

Posted by Dare Obasanjo at

+1 to Joe's comments in http://www.intertwingly.net/blog/1394.html#c1052619146

Posted by Simon Fell at

+1 on content:encoded over xhtml:body.  Requiring well-formed XHTML is just too much of a burden on content producers.

+1 on dc:creator over author.  Mandatory email addresses are spambait, and the RSS 2.0 spec is quite clear that author, if present, MUST include an email address.

+1 on dc:date over pubDate.  ISO dates are just as easy for producers to create and easier for most consumers to parse/sort/deal with.

Posted by Mark at

Greg - you seem to think of "out of profile" as if it were a fatal error.  I see it as possibly not following best practices.

I suspect that at the core of this is the belief that description, content:encoded, and xhtml:body are merely synonyms.  This appears to be the NewsGator view of the world.

IMHO, for longer entries, description should be something that could be downloaded to a blackberry and xhtml:body (for those who believe in the well-formed web), and content:encoded (for those who don't) are for the full content when it differs from the description.

Posted by Sam Ruby at

Despite Dare having quoted me on relative links, I now disagree with him.  Just allow the use of xml:base on any RSS element, that way it becomes the producer's responsibility to say what they mean, and xml:base is falling-off-a-log easy to implement.  So I think I agree with Don, only I'd go further; xml:base anywhere.

I'm fine with the rest of Don's version, except for, simplifying item/description leaves a hole;  virtually every OS now has an embeddable doohickey for rendering HTML, and it seems a pity to keep aggregators from prettying up the excerpts.  On the other hand, we don't want to lose well-formedness.

So I don't disagree with Don, but I don't want to lose prettified excerpts either.  None of html:body or html:div or content:encoded feel quite right.  I'm perfectly willing to provide an excerpt of my article, make sure it's well-formed, and decorate it with a bit of HTML, and it seems that RSS really ought to let me.

Posted by Tim Bray at

Before we go too far down the "I like parsing this element" road, we should probably focus on what a profile can actually do for us. Unless someone decides to write a "Sam, Don, Dare, and nobody else" aggregator, you don't get to stop parsing RFC822 dates, you don't get to completely ignore the link element, all you get is a little more certainty about what some elements mean. By saying that description is plain text, you let those of us who think that way talk about markup in description, because once you've identified a feed as profile-compliant, you know that you need to escape < and > before you hand it to an HTML renderer. Since you can't control whether or not people use link to point in or out, all the profile can do is reinforce the priority that you should already be using to find a permalink for an item: guid isPermalink first, link second. And you can know how to handle relative URLs. That's it. It's not going to cure every RSS problem in every RSS feed, just make a few of them a little easier to interpret.

As to dc:date - I really don't care, since you've still got to parse them both, but if you use dc:date I'd suggest using a little more precise profile than RSS 1.0 does, since <dc:date>2002</dc:date> is a fully valid RSS 1.0 item date.

Posted by Phil Ringnalda at

Tim,

I assumed xml:base would be allowed anywhere, which is why I referred to XML Base, not the specific attribute. I'll make this more explicit in the next round.

Honestly, I'd be happy to allow markup in most elements (title, description).  I too want it to be real well-formed markup, not escaped HTML 3.2.  I toyed with defining description (and probably title) according to the Inline psuedo-type in XHTML, but was concerned about breakage.

DB

Posted by Don Box at

Phil,

See my response to Tim wrt embedded markup.

How would you feel about defining description and title according to the Inline psuedo-type in XHTML?

DB

Posted by Don Box at

Sam - after reading your comment about description vs. content entries, and thinking about it a bit, you got me - I agree now.  I'll buy the idea of description being a shorter version of the content, when another content element is available.

Posted by Greg Reinacker at

On version 0.03

1. Why are the length limits a good idea?  I'm not challenging, just asking, I don't know the history.

2. What do you mean by XMLBase?  If I don't recognize the term, I'm sure some others don't either.  If xml:base is included everywhere, the profile should make it clear that it is authoritative, it may not be compulsory to deal with relative URIs, but if you do, here's where you get the base from, nowhere else.

3. Should be specific about xhtml:body; does it imply that it contains the full text of the item?  I kind of hope not, but am highly prejudiced due to regularly being an author of 1000-word-plus entries.

Posted by Tim Bray at

I think if you allow child elements in link and description (assuming that's what "Inline psuedo-type" means), you'll definitely break existing tools.

On Don's version 0.03:

1. What do you guys think about adding the <guid> element?  It's the only sure way (so far) to uniquely identify a post that is being modified over time.

2. creator - again, I'd like to define a recommended format in the case where an email address is included.

Posted by Greg Reinacker at

Tim,

The length limits trace back to the early days of RSS. I can easily live without them.

I was referring to XML Base (http://www.w3.org/TR/xmlbase). Sorry for dropping the space.

As for how much goes in the xhtml:body, let me get my kids to bed. That's a longer story :-)

DB

Posted by Don Box at

RSS Profiling Wiki

Don, Sam, Ben, Mena and others have started blogging about a profile of RSS. I don't think blogs are the... [more]

Trackback from mnot's weblog

at

The problem with both xhtml:body and content:encoded is that they're a hassle to deal with if all you want is text. Please don't assume the only consumers of RSS feeds are desktop newsreaders. I think a strong argument can be made that--in the spirit of placing the least demands on a client -- the /rss/channel/item/description should only contain plaintext. I give a definite +1 on this. (This shouldn't preclude people from being generous and providing xhtml:div and content:encoded). (Further, I'd argue that most OS'es don't have an embeddable doohickey for displaying HTML when you begin considering devices like pdas, older cellphones, blackberries, pagers, etc.).

The size limits on description and title don't make much sense. Leave that up to clients. If a client only wants the first 500 characters it'll only take the first 500 characters. You don't have any control over this anyways.

On the one hand, I agree with Dare that /rss/channel/item/guid is a tremendous hack and further, violates good web architecture, but on the other hand I think it's necessary for RSS to be used in many interesting domains. I'd go further and relax the restrictions on guid--it should just be an xsd:string. It's not fair to assume that the system generating an RSS feed will be capable of providing permalinks nor should we assume that RSS feeds will always be retrieved over HTTP. So I'd like to see guid be demoted to a string and let the isPermalink attribute be the decider.

I'd actually prefer pubDate to dc:date. As Phil points out there's greater restrictions on pubDate which makes it easier to handle.

I'm not sure exactly what xml:base gains you but since it is trivial to check for and allows for greater flexibility, why not.

On another note, what is the point of defining optional elements in the profile? I was under the impression that the profile would enforce the bare minimum that an RSS provider must provide? If something is optional it shouldn't be in the profile. That said, I'd get rid of category and make author required. Comments doesn't make sense for sites other than weblogs eg RSS describing some machine on the network or RSS describing the NYTimes. It should go.

I'm not sure, but I'm thinking link should probably go. People should put permalinks in the guid attribute if they have them. If HTTP isn't being used to access the RSS feed then link doesn't even make much sense.

Posted by Bo at

I blogged a separate but related issue, if we agree on a profile do we seek official blessing?

http://www.tbray.org/ongoing/When/200x/2003/05/10/RSS-std

Posted by Tim Bray at

Don - just say no to markup in title and description. It's practically the only clear, pure benefit to a profile. I quite often want to use angle-brackets in an entry title, or in a description, but because of the ambiguity in the RSS, I can't. Sign a few aggregators onto a profile that says they are plain text, and escapes them before they go to an HTML renderer, and I'll say screw the rest.

Bo - you're solving a different problem with guid than the profile is: by requiring that isPermalink be true, it's sidestepping the very widespread confusion about whether link should be a permalink or a link to the thing discussed by the post. And don't forget, this is a profile, not the spec. Can't produce it? Don't. The whole world will still be trying to do its best to render the whole of RSS; things that know about the profile will just have a better chance of getting it right.

Posted by Phil Ringnalda at

RSS 2.0 Getting Standardized?

It looks as if RSS is getting standardized. Well, the process seems to be just getting underway in a semi-organized fashion. This post by Sam Ruby is the best starting point and will lead you to all the pertinent info. Also: A Proposal: RSS for...

Excerpt from MovableBLOG at

RSS Profile Feedback.

Good initial thoughts from Don Box and Ben Trott amongst others. I really think before this goes further an important issue needs to be addressed. At what point does the specification stop and extensible modules begin? I offer my thoughts.... [more]

Trackback from tima thinking outloud.

at

My take on standardization: http://www.gotdotnet.com/team/dbox/default.aspx?key=2003-05-11T06:51:33Z

In short, I'm happy to see some folks finish this chat over in OASIS (imo, the best work has been happening over there lately - see RNG).

I'd prefer to see a few more details worked out here before the multi-thousand dollar door charge gets tacked on, but I'm hoping there's not much more left to work out.

Posted by Don Box at

Phil,

I completely agree WRT no markup in title. There's strong precedent both with HTML's title and mail's Subject header.

As for description, if back-compat weren't an issue, I'd prefer to have an element called "abstract" or "synopsis" or "teaser" and allow XHTML Inline markup and clearly define its relationship to xhtml:body (or content:encoded if need be).  That way there's no preconceived notion of what the thing is for.

However, I didn't think this particular science experiment was to start from scratch - that would be too easy :-)

DB

Posted by Don Box at

1) In RSS 0.91 title was a required element of item along with link. Instead of title being required if no description and optional otherwise, I suggest this profile revert back to the rules set in the 0.91 spec.

For example, "Don Box has proposed as profile..." derived from a description is not nearly as helpful to me in a summary view then a proper title like "An RSS 2.0 Profile." If you are the type of person who writes many small entries and doesn't want to title them you could use something like "tima thinking outloud. May 11 2003 20:12 -5:00." (Blog name and timestamp.)

2) If you are going to include a comments then why exclude a trackback tag? It seems to me that this should be a namespace.

Posted by Timothy Appnel at

Timothy,

I agree that title should be mandatory - you're the second person to call for it.  Are there significant communities of users who can't produce a title element? .

As for trackback, I believe it's not in either RSS 1.0 or RSS 2.0 core, so I was hoping to leave it out for now.

DB

Posted by Don Box at

Required titles bad. Seems like we've done this often enough that I shouldn't need any more ;)

If you as an aggregator have a display which requires a title, and there isn't one, and you like rss/channel/title plus some date format, use it. If my display format does not require a title, but you  as a producer feel forced to choose some non-title-title, then I don't have any way of knowing the difference between your plug-ugly date title or repetitive first n words title and an actual title that conveys information. If you can fake a title as a producer, then a consumer can fake it the same way, but if you fake it, a consumer can't ignore it.

If you choose blog name plus datetime, then in an aggregator that displays

blog name datetime
title
entry

your feed looks like crap.

If you choose first n words, then in an aggregator that displays

[bold]title[/bold] entry

your feed looks like crap. I've used both, and subscribed to both, and they're both ugly and annoying. Not requiring title breaks backward compatibility with RSS 0.91 parsers that never imagined anyone would omit the title, and yet Mr. Backward-Compatibility did it anyway. That's significant.

Posted by Phil Ringnalda at

Don: 'As for description, if back-compat weren't an issue, I'd prefer to have an element called "abstract" or "synopsis" or "teaser" and allow XHTML Inline markup and clearly define its relationship to xhtml:body (or content:encoded if need be).  That way there's no preconceived notion of what the thing is for.'

+1

Posted by Tim Bray at

As Ben Trott has done, I think the starting emphasis should be on the data model rather than the syntax. Once the entities and relationships found in RSS have been identified and described, only then express the syntax. A subset is bound to bring in some elements of backwards non-compatibility. Starting with a data model will mean that time isn't wasted trying to maintain compatibility with artifacts that are just anachronistic hacks.

Unless this is to be a completely new third spec muddying the waters still more, I think effort should be made to ensure compatibility with RSS 1.0. In particular, the entities and relationships defined in the RSS data model should be mapped to resources and properties in the RDF model.

An example of equivalent feeds given in 'profile' and RSS 1.0 would probably suffice.

guid should go - URIs are globally unique ids

use dc:date, W3CDTF

Clarification required : what is the description about? 'guid'/permalink or (in the RSS 1.0 sense) about the target URI?

If XML from other namespaces can be inlined (such as xhtml) then it should be made absolutely clear 1. what is permissible and 2. how an agent should interpret it
Again, I think the way to avoid ambiguity is to define this in terms of RSS 1.0.

Posted by Danny Ayers at

Just seen version 0.03. Looking good.
Why not bite the bullet now, and give all the items a namespace?

Posted by Danny Ayers at

Just to add a general point about why RFC-822 format dates (such as pubDate) should be avoided at all costs: it is impossible to declare these fields as having type=dateTime in an XML Schema, so they can only be specified as strings with a pattern restriction. 

Clearly this loses something in translation, not least because things like XForms and MS InfoPath can then not automatically use a "date picker" for that field and would need the user to type in the exact format manually.

See some details of all the fun I had investigating this problem here:
http://www.thearchitect.co.uk/weblog/archives/2003/04/000142.html

Posted by Jorgen Thelin at

Just to chime in one last time before the thread wanders off...

What I'd really like to see is the following:

RSS Core : The absolute bare minimum that  an RSS provider must provide. This profile would assume the least about the context in which the RSS is being consumed. No title. A plain text description. pubDate. guid with the isPermalink attribute required. If isPermalink is true then it must be a URI to the originating site. If isPermalink is false may be any kind of GUID. (URIs are a proper subset of GUIDs). Required author email address.

This may not seem like much but you'd be surprised. This profile would be great boon to developers using RSS for low-end mobile/embedded devices, streaming RSS over non-HTTP (eg Jabber, Event Systems), and doing machine-to-machine RSS.

On top of this core profile (that is a strict superset) we'd define further profiles. The purpose of these profiles would be to codify the best practices of RSS and serving as clear guides to producers and consumers. This is in the spirit of what Phil said: we want people  use elements that'll actually get used.

For example, you might consider two further profiles. An RSS:Basic profile would throw in things like title, link, content:encoded, xhtml:body, author, i18n and some other stuff. These are the kinds of RSS documents you'd expect from web-based/non-weblog sources eg newspapers (NYTimes, CNET, Wired), open-source projects, web forums where the RSS data is passed through HTTP.

Finally, on top of the basic profile would be a RSS:Weblog profile that would explicitly assume that the producer is a weblog and the consumer is a rich client for weblogs. Here you would throw in things like comments, trackback, category, topic maps, FOAF, blogRoll, geoURL, and special stuff for sites like daypop, technocrati. Here the possibilities are endless.

Finally, it'd be nice if there was a quick and dirty way to tell what profile a feed supported. This could be done with something as trivial as a processing instruction.

These profiles aren't hard and fast specifications (though they could be enforced with a validator) they're simply a mechanism to promote the use of RSS wherever data needs to be syndicated. The idea is we get profiles that build on top of one another and that please the majority of people.

I'd also suggest this approach cuz I  get the feeling an RSS:Core and RSS:Basic profile could be produced relatively quickly and painlessly but an RSS:Weblog profile would need a lot of work and thought and communication.

Anyways, just my two cents.

Posted by Bo at

Hmm, I'd say less is more here - at most just have an RSS:Core and RSS:Weblog.

I would strongly suggest that they were hard and fast specs - if people choose to ignore them, fair enough, but if they claim compliance then a compliant reader or whatever should work 100%
If the syntax follows that of RSS 2.0, then DTD-level validation (at least) should be a requirement.

There's no point in trying to put everything into RSS:Weblog - the possibilities are endless, so it wouldn't make sense to offer any standard. Perhaps just Core + FOAF + Blogroll?

URIs are the universal id of the web, is there any need to allow any other?

Changing the range of an element (guid) according to the value of an attribute is pretty unconventional, and causes problems with validation.

(I don't know about URIs being a 'proper subset' of GUIDs - which particular specification of GUIDs are you referring to?).

Posted by Danny Ayers at

An RSS 2.0 Profile

It was obviously a busy night in RSS Standardization land last night: Don Box has created a proposed RSS 2.0 Profile definition. Sam Ruby is collating comments on this proposal on his site. Meanwhile Mark Nottingham has set up a Wiki on his site to... [more]

Trackback from TheArchitect.co.uk - Jorgen Thelin's weblog

at

Let's take just a quick step back so we can really progress: Let's define what we're defining here. +1 RSS:Core Profile for Weblogs. There seems to be much confusion about what we're trying to accomplish, and when you couple that with all our enthusiasm, this thread will reach 400 comments by the end of the day.

Posted by Christian Romney at

Collected links

Lots of interesting things the last couple of days: The new features of Java 1.5 are being outlined. This time... [more]

Trackback from protocol7

at

As regards a standards org, I wouldn't be very happy wiith Oasis, the main reason for doing this is political not technical, and Oasis has no name-recognition value.  They provide a good place to work out the technical details, they don't provide much in the way of a weapon to threaten recalcitrant vendors; the latter is what we need.

Posted by Tim Bray at

Collected links

Lots of interesting things the last couple of days: The new features of Java 1.5 are being outlined. This time... [more]

Trackback from protocol7

at

description stripped of markup - Even though I treat description as a teaser, not a body, I like that the teaser can be markup-enriched. I could live without that. But will most people want to?

content:encoded vs xhtml:body - Can the profile embrace both? It's certainly a burden on most users to produce xhtml:body, but I'd like to encourage its use. Otherwise where's the incentive to make xhtml authoring easier?

Posted by Jon Udell at

Hmm, I'd say less is more here - at most just have an RSS:Core and RSS:Weblog.

I would strongly suggest that they were hard and fast specs - if people choose to ignore them, fair enough, but if they claim compliance then a compliant reader or whatever should work 100%
If the syntax follows that of RSS 2.0, then DTD-level validation (at least) should be a requirement.

There's no point in trying to put everything into RSS:Weblog - the possibilities are endless, so it wouldn't make sense to offer any standard. Perhaps just Core + FOAF + Blogroll?

URIs are the universal id of the web, is there any need to allow any other?

Changing the range of an element (guid) according to the value of an attribute is pretty unconventional, and causes problems with validation.

(I don't know about URIs being a 'proper subset' of GUIDs - which particular specification of GUIDs are you referring to?).

Posted by Danny Ayers at

Don: Its not that any tool cannot ---- a title, its that some authors do not want to and believe that title should be optional. When I wrote my RSS Feed quality article I took some flack for suggestion a title be mandatory. While this makes little difference technically, it does from a users standpoint. Title are important to the scalability of data -- heads, decks and leads. The solution for those who do not want to write their own title is quite easy -- auto generate one based on blog name and timestamp of the entry.

I agree TrackBack should not be in the core. It was late and I was trying to make a point that comments should not be in the core, but an extension module.

Moving to other parts of the discussion...

+1 on establishing a context in which we are working. Even in retrospect. There seems to be a certain assumption to what we are trying to achieve here -- we've been talking about these issues for a REALLY long time. They may not be the right ones though. I see themes of simplification of the format, providing a simple core, predictability, and moving optional tags into modules.

+1 on the other Tim's suggestion to not standardize through OASIS. I have not been impressed by OASIS and I'm skeptical of that organizations effectiveness.

Posted by Timothy Appnel at

Just a reminder that even though many non-English weblogs state a specific encoding in their RSS feeds - either in the XML or RSS tag; I myself use both - it still seems to be common practice to use no explicit encoding but escape characters in titles, descriptions, and so forth with the &#NN; or &#NNNN; syntax.
Should this, then, be abandoned in favor of alwats specifying encoding="UTF-8"?

Posted by Rainer Brockerhoff at

RE: Don Box's RSS profile

Rainer,
I believe the default encoding used by XML processors is supposed to be UTF-8 although a quick glance at the XML 1.0 spec doesn't lead me to where I can explicitly point it out although there are one or two places it is heavily hinted.

Message from Dare Obasanjo at

Maintenance

I've spread myself too thin.  Inspired by Tantek's "What to do with things to do", I have decided to prune.... [more]

Trackback from dive into mark

at

Rainer,

The use of character references is (mostly) orthogonal to the selection of text encoding (e.g., UTF-8, ISO-8859-1).

I find myself using character references even when saving a file as UTF-8, mainly because it's easier to remember the character code (e.g., &#160;) than how to get a particular text editor to emit the right sequence for non-breaking space.

SOAP already took its lumps for prohibiting XML 1.0 constructs (e.g., DTDs, PIs). I'm not anxious to repeat that experience anytime soon.

DB

Posted by Don Box at

On OASIS vs. the world, I haven't been impressed by any standards org for a while.

I was especially unimpressed by OASIS until James Clark and Murata Makoto chose OASIS for Relax NG - RNG is pretty stunning and I doubt that the "two from each vendor" approach prevalent at the W3C could have produced anything nearly as well designed.

For me at least, RNG made OASIS a lot more attractive. It also seems like the least intrusive way for a handful of people to quickly get agreement.

As to Tim's comment about name recognition value, this one is pretty much moot. The IETF and W3C have published more than their fair share of specs that no one cares about. Same for OASIS. What matters is adoption, which, as RSS has proven, doesn't require the Good Housekeeping stamp of approval.

All of this said, if I'm the lone OASIS sympathizer, then IETF it is.

Just don't make me author in that blasted hard-line-break format :-)

Posted by Don Box at

RE: Don Box's RSS profile

I'd like to echo Don's thoughts about OASIS, name recognition and adoption. I was about to write practically the same thing in response to Tim's email but it seems Don beat me to the punch.

Message from Dare Obasanjo at

Hmm, I'd say less is more here - at most just have an RSS:Core and RSS:Weblog.

I would strongly suggest that they were hard and fast specs - if people choose to ignore them, fair enough, but if they claim compliance then a compliant reader or whatever should work 100%
If the syntax follows that of RSS 2.0, then DTD-level validation (at least) should be a requirement.

There's no point in trying to put everything into RSS:Weblog - the possibilities are endless, so it wouldn't make sense to offer any standard. Perhaps just Core + FOAF + Blogroll?

URIs are the universal id of the web, is there any need to allow any other?

Changing the range of an element (guid) according to the value of an attribute is pretty unconventional, and causes problems with validation.

(I don't know about URIs being a 'proper subset' of GUIDs - which particular specification of GUIDs are you referring to?).

Posted by Danny Ayers at

Don: When might a 0.04+ version be published? When will we move on to items outside of item?

Just curious while there is somewhat of a break in the action.

Posted by Timothy Appnel at

If we feel that we need a standards body, then I vote for IETF.

+1 for xhtml:body.

Posted by Ted Leung at

Timothy, titles are great for users. But in cases where for any reason the producer cannot/does not want to provide them, your easy solution that they should still be mandatory and the producer should auto-generate a title from blog name and timestamp is IMHO a classic example of premature optimization. A missing title could just as easily be auto-generated by the consumer, usually with better results (after all the producer cannot know a priori what language and date/time format the consumer will be using, for example.) Also, in your proposal there is no way for a consumer to distinguish a "real" title from an auto-generated one, so the consumer is forced to use it even if it could potentially have generated a better one itself (unless you're going to introduce an attribute isAutoGenerated or something...)

Mandatory titles are at least as silly/annoying as RSS 1.0's rdf:Seq :-)

Posted by Ingve at

RSS: The Neverending Story

Hi, Evil Twin here! I'm not sure why, but the very mention of RSS tends to bring me out of my quiet corner, where I sit filing my nails while Burningbird does her thing. So, while she's off cleaning house and trying to get the next episode of that... [more]

Trackback from Burningbird

at

Real-time Simple Standardisation

These are busy days in the RSS community. Dave Winer thinks Microsoft is going to fuck all of us and...... [more]

Trackback from Gotzeblogged

at

RSS Roundup

Craziness going on in the RSS world.  As usual, a good bit of it is happening over at Sam Ruby's blog: RSS Profile Don Box's RSS Profile RSS Identity [See also Ben Trott's entry]...

Excerpt from Matt Croydon::postneo at

Aggregators Revisited

It seems everyone built an aggregator this year but none of them solve my problems. My subscriptions are not portable. Sure almost every aggregator imports and exports OPML but none of them give me much control over the process. More importantly...

Excerpt from matt.griffith at

Ingve: So if I'm following what you are saying, a consumer knows the content better then the producer to generate their own titles?

Posted by Timothy Appnel at

Here's my interpretation of the pulse of this list wrt profile 0.03:

1 Keep title optional.

2 Ambivalence about embedded markup inside description/title, but slight leaning towards prohibition on embedded markup in those elements.

3 Ambivalence about length restrictions on title and description, especially the latter.

4 General (but not unanimous) desire to keep guid element out of profile and stick with link.

5 General desire to go with xsd:dateTime-based date element.

6 General happiness with dc:creator over RSS/2.0 author, but some desire for guidelines on format (this so calls out for structured XML...).

7 No comments on comments :-)

8 Split on XHTML:body vs. content:encoded.

So, assuming the a 0.04 profile were to (a) relax length restriction on description and (b) provide some guidance about how to structure the creator element, are we done with the item portion of the profile modulo xhtml:body vs. content:encoded?

Posted by Don Box at

Sam Ruby: Don Box's RSS profile

Fascinating and important RSS standardization discussion.<quote>Don Box: Here's my suggestion for where to start. I won't even consider calling this a profile until consensus is reached over on Sam's site (he has comments, neither Dave nor I...

Excerpt from Roland Tanglao's Weblog at

Does anyone know why Dave has been absent here?

I think it was his blog entry that kicked this profiling stuff into gear.

It seems odd that he wouldn't want to participate given that he asked for the community to rally behind a profile.

Posted by Don Box at

Don: I'm not sure about calling the item modulo "done."

My own personal viewpoints withstanding on your assessment, I think before this goes further an important issue needs to be addressed. *At what point does the specification stop and extensible modules begin?*

This is not to halt or derail these proceedings, but I think its an important to making informed decisions to whether something is done.

Posted by Timothy Appnel at

Don - looks ok to me, except I'd rather not see markup inside the title element.  Count me not ambivalent on that one. :-)

Posted by Greg Reinacker at

+1 on "title SHOULD NOT contain markup"

+1 on "description SHOULD NOT contain markup".  Producers that currently produce entity-encoded full-text descriptions can put this markup, verbatim, in content:encoded instead.  Current generation aggregators already support this.

Posted by Mark at

+1 on optional title.
+1 on keeping markup out of title and description
+1 on xsd:dateTime.
+1 on dc:creator over author.
+1 on content:encoded over xhtml:body. Nothing against xhtml:body but it can't be required or the profile will be irrelevant.

As far as guid & link go; I was all for dropping guid but Bo makes a good point [1]. What about the non-HTTP case?

[1] http://www.intertwingly.net/blog/1394.html#c1052627462

Posted by Matt Griffith at

Timothy, of course in general the producer knows the content best, but both your own example and my reply was about the case where the author did not provide a title. Your "easy" suggestion in this case was that the producer (I guess that would be the author herself or her RSS-generating software) "auto generate one based on blog name and timestamp of the entry". In this specific case the consumer can auto-generate a better title than the producer, but if the producer has already auto-generated an inferior title, the consumer is stuck with it.

Posted by Ingve at

Ingve: I'm still not following your reasoning. How is the consumer generating a better title? Code? What "formula" would you suggest the consumer using to generate a better title? Wouldn't this vary depending on the feed and its content?

The point of my suggestion was to make it easier for users (read: people who can't code or don't want to bother) and improve scanability of information.

Posted by Timothy Appnel at

Regarding Bo's concern: reading the RSS 2.0 spec, the guid element with isPermalink does not require HTTP. It only says it should be a permanent link suitable for use in a web browser. It's perfectly legitimate today for the guid to point to some NNTP server where articles never expire, for example. I'm fairly confident that URIs would still be useful even if HTTP were to become obsolete.

Posted by Adam Fitzpatrick at

Timothy, first: I totally agree that titles are a good thing and should always be provided for the benefit of the user. In some cases, where the feed author does not provide one, this will require a synthetic title.

Your suggestion is that the producer auto-generates the title based on the algorithm "Blog name and timestamp", yielding for example "tima thinking outloud. May 11 2003 20:12 -5:00." That might "improve scanability of information" if you are a geek and English is your first language, but for users with other date/time conventions it will stick out like a sore thumb. (I believe Phil described it as "plug-ugly date title" in an earlier comment.)

In this case, if the title is missing from the feed, the consumer (RSS reading app/aggregator) would be able to auto-generate a better title using your own algorithm (it knows the blog name and item timestamp from the RSS feed, and it knows about the reader's language/date/time settings.)

If you look at the flow from Author->Producer->Consumer->Reader, a title should always be available when we get to the Reader stage. By moving the auto-generating step from Producer to Consumer, we get a better (at the very least localized) title and save bandwidth on every RSS feed transmission.

(Yes, this would be done with code. In your case by the producer/RSS generator, in my case by the consumer/RSS aggregator. Aggregators already have code to deal with title-less items and/or code to format a timestamp using local conventions.)

Again, this only applies to the specific case where the author does not/cannot/will not provide a title:

The author does not have to make up a fake title. The producer software does not have to be updated to always generate a fake title. Every single RSS transfer saves bandwidth due to lack of fake titles.

When the title is generated by the consumer, English-speaking geeks would get the exact same title as you proposed, but save a little bandwidth. Non-English/Non-Geeks would get better titles and save a little bandwidth.

In your case, consumers are also precluded from doing even more advanced things (like your own suggestion about generating different missing titles on a per feed basis) since there is no way to tell the difference between an auto-generated title and a real one.

Posted by Ingve at

Tim: my aggregator displays the blog name, the date, next line is the title if any, then the item. With your autogenerated titles in the feed, I get

tima thinking aloud May 10, 2003
tima thinking aloud May 10, 2003
This is a post.

Fine, I switch to using first n words in bold at the start of the item, then I subscribe to someone who sticks those in as fake titles, and get

Don, please update the last... Don, please update the last line of your proposal to point to this blog entry.  I'll comment on it below.

In what possible way is that better than not putting in a fake title, allowing my aggregator to decide which sort of fake title best suits its display? Answer: it's not. If you just either do real titles or leave the element out, then for the first style I can use first n (or nothing), and for the second style I can use blog name (or nothing), and for a 3PA I can use whichever pleases me more. But if you put lying metadata in a feed without telling me you are lying, especially if it's lying metadata that's just directly derived from other true metadata, you have devalued your feed. Feeds without titles exist, and have since 0.92 in December of 2000. Any consuming application that doesn't have any way to deal with that is broken, and messing things up for the rest of us to cater to them is a bad, bad thing. Please don't do it, and don't encourage others to do it.

Posted by Phil Ringnalda at

I think we need to answer the following questions in order:
1. What are we defining here and what does that mean for tool implementers and end users? Is our philosophy 100% backward-compatibility (I think it should be!) 2. Will we continue this work in this 86+ long thread in Sam's blog or in a standards body? 2A. If in Sam's blog, would he be good enough to start a clean thread where we can proceed with a bit of order? (Sorry if I seem obsessive-compulsive about order, but I believe a little structure would do some good here) 2B. If a standards body, which one? 3. What are the items up for discussion? Only items included in the RSS 2.0 spec + <content:encoded> / <xhtml:body>? Namespaces? etc...

IMHO, meetings, even virtual ones, work better with an agenda...
WRT <xhtml:body> - while I agree with the principle of well-formedness, I don't always have easy access to the data source for my feed! Follow this scenario (one from the real world) I produce an RSS feed for Continental Airlines Vacations. However, the data in the feed comes from disparate sources. I cannot guarantee the well-formedness of these sources, so I have had to use <content:encoded> rather than <xhtml:body>. Granted, with a little work I could convert the malformed content into well-formed content. That is not the issue. We can do nearly anything with enough time and effort. My concern is that this might not live up to the spirit of RSS -- Really Simple Syndication. I will probably end up implementing <xhtml:body> if that is what we decide on, but how many others will? Again, this goes back to philosophy...

Posted by Christian Romney at

+1 on no fake titles.

Posted by Greg Reinacker at

The biggest problem with RSS

It has been pleasing to see the weekend discussion on RSS.  An author of an RSS aggregator has indicated that the "main value adds I get out of RSS are not the core RSS 2.0 specs but modules added by various third parties like xhtml:body, ...

Pingback from Sam Ruby: The biggest problem with RSS

at

Christian, re: (2A) don't worry, I'll create plenty more blog entries.  Feel free to comment on the ones you like.

Posted by Sam Ruby at

re: fake titles... any suggestions on how the validator should test for these?

Posted by Sam Ruby at

I don't think we necessarily need to test for them; we just shouldn't make the title element required.  If we keep it optional, then publishers aren't compelled to create a fake title.

Posted by Greg Reinacker at

Greg, here is how I see it.  The RSS validator to date has picked the most liberal interpretations it can.  If it finds some element that can reasonably be interpreted as valid in some version of some specification that calls itself RSS, then the element is considered valid.

Now I would like to go in the other direction.  Provide guidance in the form of warnings.  A feed may be valid, but if it doesn't comply with some profile or isn't as useful as it possibly can with the widest possible audience, then I would like to provide useful feedback on this.

Title, link, and description used to be required, and had explicit length limits.  Tools that depended on these were broken by subsequent revisions of the RSS specifications.  Some have adapted, others cope poorly, and some simply have given up.

An example of the types of advice I would like to give is that if you have a feed where the content has contributions by multiple authors, please indicate that with the appropriate element.  You yourself asked me to do that with my comments feed.

Required?  No, your feed is still valid without it.  But best practices would indicate that it should be done.

Similarly for short, human created, html free titles.  The best RSS feeds out there have them.

Posted by Sam Ruby at

I think the debate on title has reached a point of diminishing returns. I don't think we are all that far apart really. Let me wrap up a few things and move on.

I don’t support “fake titles” or “lying metadata.” My suggestion for auto-generating a title was a reply I have used for those who are adamantly opposed to using titles in their feeds despite the said time-test benefits to their consumers. There are also some instances where a descriptive title is not feasible such as a feed from CSV2RSS or comments feed from MT.

I believe that including descriptive titles makes feeds easier to scan, easier grok and simply more useful. In my own feeds I write (sometimes unsuccessfully) descriptive titles for each entry. There are significantly more 0.91 feeds then 1.0 and 2.0 combined according to the latest syndicate stats. Title was required in 0.91 and I think with good reason. In debating this point I hoped that this profile would return to reinforcing this generally accepted best practice. (One reason I wished the context in which these design decisions where/are being made would have been defined up front – or at all.)

If I’m in the minority on this then so be it. Let’s continue on with the productive conversation. I think we are making great progress so far.

Posted by Timothy Appnel at

Can You Feel the Web Shifting?

Don proposes a profile for RSS. Discussion ensues over at Sam's blog, here and here. I just love watching the process evolve in real time - that's the greatest thing about blogs, in my opinion. ...

Excerpt from CraigBlog at

> 7 No comments on comments :-)

I guess you missed my comment on comments Don. ;)

I don't think comments should not be in the profile. If comments is included that would mean that elements such as trackback:ping should be added to be consistent.

I am of course working under the pretense that we are trying to develop a simple minimal core that makes extensive use of module. We of course have not established that this is a design goal though.

Posted by Timothy Appnel at

Tim - do you feel that RSS feeds which contain trackback:ping are more useful than RSS feeds which do not?

I am looking for specific tests that can be added to the RSS validator which will make RSS feeds more useful to all.

Any suggestions?

Posted by Sam Ruby at

Makes sense, Sam...I keep shifting into the mindset of writing a spec, and I have to remember we're making recommendations instead.

That said, I wholeheartedly support saying that an item SHOULD have a title.  And I would say that title MUST NOT contain HTML markup, as I think we've all agreed.

Posted by Greg Reinacker at

I think feeds that include trackback:ping (when TrackBack is in use) are equally as useful as comments. So I would assert if you are going to include one in the profile, you should include the other or you should leave both of them out. (Adding trackback:ping to an RSS feed could make auto-discovery much easier and elegant then embedded RDF in HTML comments.) This can become a slippery slope, because then you could reason a lot of other tags should be included.

As you know Sam, I have a stated bias for only a hand full of core, required elements and the rest in modules. I am of the mind that the URI for comments should be expressed in a module with an element that is a bit more flexible for other uses -- like a pointer to a comments feed.

As for other suggestions, I’ll have to think some more. Sometime ago, I wrote an entry of suggestions for the RSS validator. I’m pleased to see that most of them have general consensus approval here.

As for detecting fake titles that’s hard. Perhaps detecting "ineffective titles" that are likely to have been auto-generated would be easier and get you 80% of what you want to achieve. Do all titles begin with the same string/x characters?  (i.e. tima thinking outloud)  Are any item titles repeated in the feed? (i.e. Don Box’s RSS Profile – while I think this is a minor issue overall, following this comment feed breaks blagg unless you've heavily modified it like I have.) These are just two checks that leap to mind.

Posted by Timothy Appnel at

I'd also like to avoid things like this in the future:
Dublin Core Spec - http://dublincore.org/documents/dces/
Dublic Core Namespace URI - http://purl.org/dc/elements/1.1/

Ideas???

Posted by Christian Romney at

Christian,
I'm not sure what you are talking about. Avoid them how?

What to place at the end of a namespace URI is a perma-thread on XML-DEV, WWW-TAG and a number of other mailing lists and has generated probably thousands of mail messages with no generally accepted resolution in site. I'd advise not trying to open that particular can of worms.

Posted by Dare Obasanjo at

Really Something to See

This weekend saw a flurry of activity in the RSS space from a lot of people... I'm not even going to try to recap everything,...... [more]

Trackback from Incessant Ramblings

at

RSS Profile Design Considerations: A Conversation Starter.

A very productive discussion of an RSS profile has continued throughout the weekend and into this morning. Enthusiastically many have dived in – myself included. I still maintain that an important consideration has not be discussed and needs to:... [more]

Trackback from tima thinking outloud.

at

Dare,
The problem I was referring to is the spec being defined in <a href="http://dublincore.org/documents/dces/">one place<a/> and the namespace declaration taking you to something like this where all you get is "The Dublin Core Element Set v1.1 namespace providing access to its content by means of an RDF Schema". My point is a schema is not a spec, and, for me, Namespace URIs for modules and such should point to a spec document or at the very least to a page with a link to a spec document. Every other module listed here seems to conform to this practice. Maybe I'm just picky?

Posted by Christian Romney at

RSS Profile

As interesting as the discussion on the RSS profile has been I can't see it achieving Dave's goal, the SOAP BDG profile, as useful as it was, didn't stop "interops with Microsoft" being the key SOAP interop goal.... [more]

Trackback from Simon Fell > Its just code

at

Like I said, this is a topic that has generated thousands of emails in discussions in various XML related mailing lists with no resolution in sight.

I believe I blogged about this in the past at http://www.kuro5hin.org/story/2003/2/5/11349/85355

Posted by Dare Obasanjo at

Dare, Your point is well taken. I guess overall, there's no solution to this. :-( I would like to see whatever becomes of this profile/spec/thing we're all commenting on to be specified, however - whether or not a namespace of some sort is born from it.

Posted by Christian Romney at

Jon,

See http://www.gotdotnet.com/team/dbox/spoutletex.aspx?key=2003-05-13T10:21:55Z on ways to get there from here vis-a-vis content:encoded->xhtml:body.

DB

Posted by Don Box at

Tim Bray suggests RSS standardization

In response to a great discussion at Sam Ruby’s site about Don Box’s proposed RSS 2...To see more, visit the permalink in your browser or get a news aggregator that supports xhtml:body...

Excerpt from Tommy Williams at

After the flurry of activity this weekend, it feels like the momentum is starting to peter out. Say it ain't so. Does anyone have a solid idea of where we stand with this RSS profile?

Posted by Christian Romney at

Why do people persists in re-inventing the wheel?

SQL, HTTP and HTML are crappy protocols/languages, but we use them because they are standards, not because they are good standards. People have tried to improve on them, but with little success.

The RSS 2.0 standard is great as described by Winer, http://backend.userland.com/rss. The RSS standard has been debated to death, another round of debates will only weaken the standard. We should ratify the 2.0 standard as is and get onto writing real applications.

This the same problem that is killing SOAP. More and more debate/standards only prolongs our real goal. To write killer Web Services. Yes, I stole that paragraph from Don Box.

Debate makes it difficult to move projects forward that are based on the standards. How can I get funding for a project that might be obsolete in the next release of the standard?

Yes, I understand that not everybody was included in the debate of the 2.0 standard. Life isn't fair. I'm not too happy with 2.0 either, but let's move on.

Last, this is not as much a shot at the extensions mentioned on this page, but rather the modifications of the meaning/types/naming of existing elements.

MHO

The Real Geek
iBLOGthere4iM

Posted by Randy Charles Morin at

Another thought. I agree completely with the standardization initiative.

I'm an application developer. I need standards to live by.

Posted by Randy Charles Morin at

Randy: first you say RSS 2.0 is great, then you say a few paragraphs later that you're not happy with it.  I'm confused.

Nothing in RSS 2.0 has ever become obsolete.  Every element that was there in the beginning, is still there (even if there are new-and-improved ways to do the same thing, better, for example via namespaces).  Nothing is even officially deprecated.  There was actually some heated debate about this at the time, but there it is: nothing is deprecated, and nothing ever will be.

No one here is trying to rewrite the meaning of existing elements, or create new elements out of thin air.  No one is rewriting the existing specs, we're just talking about best practices.  Surely as an application developer, you can appreciate the importance of this.

Also, Dave Winer asked for this discussion, explicitly.  Clarifying and codifying best practices like this may help the community survive the onslaught of powerful newcomers (like Google and Microsoft).

Finally, and this has nothing to do with your argument at all, there's a typo in your blog's tagline, which makes you look kind of sloppy.

Posted by Mark at

Maybe you could help me and point out the typo.

Posted by Randy Charles Morin at

Mark,
Limiting the title and description element content length and making the guid and/or link mandatory is going to invalidate a lot of existing software. True or false?

Posted by Randy Charles Morin at

I found the "typo" :)

Posted by Randy Charles Morin at

Randy, can I answer that one?

Feeds with arbitrary length titles and descriptions and feeds without either a guid or a link are reported as valid RSS by the RSS validator.  Nobody has proposed changing that.

Title, descriptions, and links were required in 0.91, and both title and description had specified length limitations.  Software that was written which depended on one or more of these characteristics may have difficultly with RSS feeds writen to later versions of the specification.  Some adapted.  Some cope poorly, and others have given up.

All we plan to do is to document that your RSS feeds will be of maximum usefulness if they adhere to the original guidelines.  If they don't, the feeds will still be marked as valid, but some informational messages will be provided that you can choose to heed or ignore.

Posted by Sam Ruby at

Sam,
Then why does it state MUST BE <= 100 instead of SHOULD BE? And why does it say mandatory on the link element? I think this is exactly what is being proposed. Ignore pubDate in favor of dc:date. I could go on.

Posted by Randy Charles Morin at

Sam,
By the way, I appreciate your comments. Thanks. I might be in the dark here, but I'm looking for the light.

Posted by Randy Charles Morin at

Randy - it is a profile.  A restricted subset.  Compliance with the profile will not be mandated.  The validator may issues warnings based on deviations from accepted practices, but that's about it.

For clarity, it makes sense for the profile to be unambiguous.  This helps authors of validators immensely.  If the profile said "SHOULD BE <= 100" the best that the validator can do if faced with a feed with a 256 character title is to say that it MAY BE in compliance with the profile.

I would prefer to label such a feed as valid RSS but not in compliance with the profile.  And like has been done with the RSS validator to date, we will provide helpful information as to why this is important and what changes need to be made to bring such feeds in compliance.

Posted by Sam Ruby at

How is it a restricted subset if it includes elements outside of the current RSS set, like dc:date?

And to quote Dave Winer:
"We could establish a profile of RSS 2.0 and implement strict compliance with that profile in the major blogging tools."

Is dc:date part of RSS 2.0? If not, then this is no subset. Does not "strict compliance" imply mandated.

Posted by Randy Charles Morin at

Feeds which include dc:date are valid RSS 2.0.

FYI: I agree with a comment made in another thread... i.e., I have serious doubts that we could "implement strict compliance" to the simple requirement of well formed XML "in the major blogging tools".  IMHO, the best we can achieve is to provide a means by which people can validate compliance.

Posted by Sam Ruby at

Feeds that include hi:goodbye are also valid RSS 2.0. Is hi:goodbye part of RSS 2.0? No. And neither is dc:date.

Why don't we evaluate compliance to the existing specs 0.9 to 2.0, rather than a new profile (a.k.a. spec to be)?

Posted by Randy Charles Morin at

I think it's been explained previously, but in my lay terms, a profile is a a specialized application/clarification/distillation of a particular standard for a specific purpose. Maybe thinking of it more a long the lines of a schema is more accurate than a standard, ymmv. UML profiles might also be half-decent concrete example.

Posted by Grant Carpenter at

Here goes:

pubDate: RFC-822 +1

However, I have a suggestion that I'd like you all to consider, shouldn't ALL dates be expressed in GMT?  After all, localization doesn't take place until the feed is rendered to the user. If I am in London and I am reading a post from (local time) 08:45(GMT), do I care that the author was actually writing it at 4:45 AM in their home town?  No, I probably don't.  What I do care about (most likely) is that it was posted 15 minutes ago, making it "fresh and new".  Isn't this why many of us turn the TV news off and read blogs (not to mention the quality of the content)?

I suggest we even consider RFC-1123 (I might be misunderstanding the scope here, but this does include RFC-822 for date/time).  If we require GMT, then parsing dates and localizing them is simple.  Many platforms can deal with UTC offsets can they not?

I truly believe that much of the pushback regarding RFC-822 comes from two angles:

* Parsing - This argument seems to come more from platforms/languages where native date management is not supported.  In other words, there have been many functions written to brute-force parse the dates as strings, without first recognizing them as a native date/time type.  This crowd could use some good native date support.

* Schema validation - Ok, this one is tough, and I do not have a deep enough understanding if the issues to speak intelligently on it, so I won't.

Backwards compatibility: Ok, we need to "fix" RSS before billions of terabytes of human knowledge and experience is glued to the mess that currently stands as the "spec".  I'm sure that's why we're having this discussion.  Backwards compatibility should not be a deterrant for fixing the mess we have now.  There will be plenty of opportunity (both commercially and open-sores) to provide RSS convertors (think "HTML-Tidy" for RSS). 

We should consider the experience of the contributors without weighing their recommendations against "what is".  There's a reason we still use the term "legacy" and there will always be someone out there supporting the "underdog".  Keep moving...

Namespaces:  As with any open XML document format, namespace usage should not be restricted.  If I choose to include a namespace and elements of that namespace in my RSS feed, then I should be allowed to do so, provided that I also ensure that I am compatible with the base spec.  If an aggregator doesn't know about that namespace, then it should ignore it, that's acceptable in my eyes.  The idea that namespaces should somehow be "forbidden" or controlled strikes me as incredibly draconian.  It's not called "Extensible" for nothing.  I vote we retain the Extensible nature of XML within RSS.

xhtml:body (or anything else) +1
content:encoded -1

PLEASE do not encode anything!  Haven't we learned our lesson from HTML?  How much content created by the human race (forget the commercial .com craziness that shouldn't have happened)  was lost because it had some type of encoding that others could not understand?

Here is my vision of the future aggregator:  A Google-like machine that collects ALL available XML-Formatted human thoughts (what we now know as RSS) and allows us to search that knowledge.  ENCODED CONTENT WILL MAKE THE RESULTS USELESS.  As a machine, I don't care about pretty, I care about content.  However, I do believe that markup should be allowed in the XHTML namespace because that can be removed or rendered, depending on the aggregator users' taste.

"Oh, no!  Extra work.  you mean I have to validate the input to make sure the feed is not broken?"  YES, you are a developer, aren't you?

Imagine a home robot that was speech-enabled that could access this blog universe in response to a question.  A machine that could wander and "learn" from the human experience while you were at work (I have images of E.T. drinking beer and watching Sesame Street).  RSS and XML make this possible.  Encoded markup does not contribute to anything but Aesthetics.  Markup isn't just for layout or appearance.  Markup is for organization, classification, searchability.  Let's not forget that as we move forward.

Thanks for your time.

ME

Posted by Michael Earls at

RE: Don Box's RSS profile

As I have posted, GMT dates would be a BIG step forward...

Message from TorstenR at

Markup at the crossroads - getting there from here

One possible path...

Excerpt from Don Box's Spoutlet at

An RSS 2.0 Profile

Do not implement this...

Excerpt from Don Box's Spoutlet at

RSS Profile Comments Now Live

Thanks Sam...

Excerpt from Don Box's Spoutlet at

Spoutlet with a face lift

Don Box has a new look. What's most amusing is that he is now compliant with RSS Autodiscovery, which means that I now have some time shifted automatic excerpts. This one in particular appears to be an amusing cap to a lengthy thread. ...

Pingback from Sam Ruby: Spoutlet with a face lift

at

Response to Grant Carpenter's comments. Sorry it took so long. I understand your comments, but Don has suggested standardizing the profile.

http://www.gotdotnet.com/team/dbox/default.aspx?key=2003-05-11T06:51:33Z

"Assuming a profile is attainable without bloodshed, I'd be pleased as punch if someone wanted to push this through some vendor-neutral org to avoid having people perma-linking to someone's personal blog."

By the way, I agree with standardizing RSS, but I think we should settle on RSS 2.0.

MHO

Posted by Randy Charles Morin at

Randy, the key value add to me of RSS 2.0 over prior releases of RSS is the addition of namespaces.  This is what allows a thousand flowers to bloom.  From what I understand, your proposal is that we roll back that.  If that is your position, then I simply don't agree.

Posted by Sam Ruby at

My eyes are a bit blurred catching up, but I agree entirely on blooming namespaces - I prefer content:encoded to xhtml:body, but I'd rather have the choice...

pubDate RFC-822 ?

Urrgh, on several counts. DC is already standard, why not use it. RFC-822 is obsolete and (as 2822) is harder to implement than W3CDTF (ISO 8601 subset).

GMT? Went with decimalisation. See UTC.

Posted by Danny at

Sam, I'm completely in favor of vendor specific or multi-vendor extensions contained in namespaces. I'm againts including those extensions in the standard.

Posted by Randy Charles Morin at

Aggregators Revisited

It seems everyone built an aggregator this year but none of them solve my problems. My subscriptions are not portable. Sure almost every aggregator imports and exports OPML but none of them give me much control over the process. More importantly -...

Excerpt from matt.griffith at

State of RSS

This is a document I'm writing to describe the current state of RSS. This is in response to a community attempt to describe a non-standard RSS Profile. This document describes RSS as it is today, discussing the most used specifications...

Excerpt from iBLOGthere4iM at

Thanks Danny, I learned something.

+1 for DC date.

However, I'd still like to suggest that we promote the use of UTC instead of the localized time.  Again, I think localization should happen at the client, not the server.  This almost begs the introduction of a "home" TZ (or UTC offset).

Posted by Michael Earls at

GUID +1

Can we please define a REAL GUID that works as a GUID should (not as a URL).

I think it would be nice to have a GUID on each channel as well as item so that multiple items and channels can be included in a single feed and each one can be addressed and worked with easily.

My blog system currently uses a namespace to provide this functionality, and the system uses the real GUID (@id) for everything from story retrieval to comment relativity.

I think it would be a good time to suggest that we keep <guid>, but we define that it should be a real GUID (and not a link), since we have determined (earlier in the thread/discussion) that links should be a link to the item (permaLink).

Posted by Michael Earls at

Sam Ruby: Don Box's RSS profile

Fascinating and important RSS standardization discussion.<quote>Don Box: Here's my suggestion for where to start. I won't even consider calling this a profile until consensus is reached over on Sam's site (he has comments, neither Dave nor I...

Excerpt from Roland Tanglao: KLogs at

Estándares blogueros "patas arriba"

Ayer comentaba las propuestas que se están realizando para unificar el API de acceso remoto a las bitácoras. Hoy me encuentro con la propuesta de unificación y simplificación del formato de intercambio de titulares. Los chicos de SixApart proponen...

Excerpt from Desarrollo de Blogalia at

RSS Core Profile DRAFT 1

This is a slightly more formalized write-up of a proposed profile for RSS that has been discussed and brought up again. Rather then just produce an iteration of what Don Box wrote up, I thought I'd take a step back to codify some of the design... [more]

Trackback from tima thinking outloud.

at

Sam Ruby: Don Box's RSS profile

Fascinating and important RSS standardization discussion.<quote>Don Box: Here's my suggestion for where to start. I won't even consider calling this a profile until consensus is reached over on Sam's site (he has comments, neither Dave nor I...

Excerpt from Roland Tanglao: WebCMS at

Reading, Random RSS links

I try catching up on reading of what’s happening in RSS World. Mark Pilgrim: How to consume RSS safely, Mark’s little prank showing RSS exploit and advice on 10 HTML tag stripping. Some background: RSS Validator ContainsScript, Minimize...

Excerpt from yowkee essential at

Sam Ruby: Don Box's RSS profile

Fascinating and important RSS standardization discussion.<quote>Don Box: Here's my suggestion for where to start. I won't even consider calling this a profile until consensus is reached over on Sam's site (he has comments, neither Dave nor I...

Excerpt from Roland Tanglao: XML at

Prophecy

I just wanted to point out that my prophecy that this RSS profile initiative would turn into a specification is getting closer by the day. ...

Excerpt from iBLOGthere4iM at

Evolution of Atom

Two months ago, Don Box presented us with an RSS 2.0 profile. I was immediately concerned that this would bring confusion to the blogosphere and put a hold on funding of new blog related projects. Two months later, this initiative now exist as...

Excerpt from iBLOGthere4iM at

RSS Profiles for Weblogs

There's been plentiful buzzing over the weekend about creating an RSS profile for weblogs. Movable Type's Ben Trott posted his brainstorm to create a profile specific to weblogs. On the list not to be missed are Tim Bray's RSS and the S-word,... [more]

Trackback from Brainstorms and Raves

at

Sam Ruby: Don Box's RSS profile

Fascinating and important RSS standardization discussion.... [more]

Trackback from Roland Tanglao's Weblog

at

On Euphemisms: XML Web Services & SOA

Chris Sells recently complained that a recent interview of Don Box by  Mary Jo Foley is "a relatively boring interview" because "Mary Jo doesn't dig for any dirt and Don doesn't volunteer any". He's decided to fix this by proposing an ...

Pingback from Dare Obasanjo aka Carnage4Life - On Euphemisms: XML Web Services & SOA

at

Atom 1.0 Released

Tim Bray: It’s cooked and ready to serve. There are a couple of IETF process things to do, but this draft (HTML version) is essentially Atom 1.0. Randy: I see everybody has already commented on it. I’ll run down the best of the comments here. Don...

Excerpt from The RSS Blog at

Don Box’s RSS profile

Dear Prof. Don,

I am a Chinese university student who will soon graduate. My tutor asked me to do a thesis about changing Delphi compenents to COM compenents.
Various reason, the data on this aspect in China is quite a little.So I meet with a little trouble at present.
You are an expert of COM,would you please do me a favor, providing some data and giving some advice for me?
I will be very grateful to you for your help.

  yours ,
  Tianguo Yan

Posted by Tianguo Yan at

Add your comment