It’s just data

xhtml in rss 2.0

Don uncaves.  ;-)

Accordingly, I've converted my rss 2.0 feed from <content:encoded> to the to the more bandwidth and xpath friendly <xhtml:body>.  It looks like gotdotnet and blogx users will soon follow. Hopefully  the owners of the wellformedweb and w3future weblogs will take notice.

The updated feed is valid, and it uses namespaces in exactly the way that rss 2.0 and xhtml intend.  I've tested it with radio and syndirella.


Sjoerd Visscher approves, but suggests <xhtml:div> instead.  Any objections?

Posted by Sam Ruby at

The choice of xhtml:body was intentional.  The goal was to convey that "this is the content - no need to HTTP GET another entity unless you're looking for fancy styling."

Just blasting an xhtml:div into an item seemed less explicit as to why it was there.

If y'all would we happy coining yet another element name (e.g., realcontent) in some new namespace, then great.  Otherwise, I think xhtml:body is the right choice.

Posted by Don Box at

I've converted my rss 2.0 feed from <content:encoded> to the more bandwidth and xpath friendly <xhtml:body>. It looks like gotdotnet and blogx users will soon follow. Hopefully the owners of the wellformedweb and w3future weblogs will...

Excerpt from Sjoerd Visscher's weblog at

I'm using Syndirella, but it doesn't seem to be displaying exactly right.

" I've converted my rss 2.0 feed from <content:encoded> to the to the more bandwidth and xpath friendly <xhtml:body>."

when viewed on the web became

"I've converted my rss 2.0 feed from to the to the more bandwidth and xpath friendly ."

in my copy of Syndirella.

Note the loss of anything in <>.

Posted by tamaracks at

tamaracks: thanks!

There are two things going on here.  First is that syndirella doesn't (yet) know how to handle xhtml:body.  Second, it is falling back to the description which I had not properly encoded.

Now fixed!  Thanks for noticing!

Posted by Sam Ruby at

RSS and the RESTian Dilema

There we where thinking we had some agreement on the RSS format and had achieved some convergence and stability at the 2.0 level, then this happens! [more]

Trackback from TheArchitect.co.uk - Jorgen Thelin's weblog

at

Personally, I don't disagree with this change, but it does raise some interesting questions around whole-document versioning and format standardization.

Posted by Jorgen Thelin at

My policy with parsing RSS has always been to parse everything, and then run through whatever I got looking for things I recognize in descending order of preference ("is there a dc:date? that's my date. no? is there a pubdate?..."), rather than saying "this is RSS 2, so unless there's a pubdate there's no date at all", so I don't see any real problem with saying "is there an xhtml:body? how about content:encoded? oh well, I'll take description."

But the xpath users better come up with some fun and pretty implementations, to make up for the hassle I'm going through changing all my sax parsers to do something completely different when I see start and end tags while xhtml:body is open. It may be just code, but quite a lot of it is just code I didn't write, and only half understand.

Posted by Phil Ringnalda at

I, for one, welcome our new RSS overlords.

Posted by Mark at

I agree with Don on the xhtml:body vs. xhtml:div thing.  Using the body tag, IMHO, conveys more semantic meaning than using a div in the middle of nowhere.

Posted by Greg Reinacker at

Look here.

Posted by Don Box at

BlogX 1.0 is updated... i'll post new bits in the morning, but http://www.simplegeek.com is publishing xhtml:body in my RSS feed

Posted by Chris Anderson at

XHTML in RSS... for BlogX

<xhtml:body> support in RSS... thanks to Don and Sam for the proding... Expect a public rev 20 of BlogX with this feature tomorrow... ...

Excerpt from simplegeek at

Don, you have <body> tags in your <description> elements. That doesn't seem right.

Furthermore, I thought the whole point of RSS was simple syndication. Having description, content:encoded or xhtml:body tags, doesn't make things simpler. And there aren't any precedence rules either.

(BTW, I like the spellcheck;)

Posted by Breyten at

Breyten: Don already said that he would try to get his descriptions properly encoded by Monday.  There problem with description is that there is no documentation as to what proper encoding is.  Some people don't encode it, others encode it once, and some encode it multiple times.

content:encoded is better documented, but it renders the structure of the comment opaque.  Much of the what makes the web work is that every bit of structure that people can tolerate putting into their data is fully exposed.

xhtml:body is a step forward.  But it isn't for everyone.  In particular, it won't work if the content is not well-formed XML.

As for precedence rules, these will emerge.  RSS is not controlled by the W3C and these elements were proposed and implemented by different people.  In this case, I think the order is clear: xhtml:body then content:encoded then description.

Posted by Sam Ruby at

xhtml:body

xhtml:body... [more]

Trackback from ScottW's ASP.NET WebLog

at

Quick links

Cables To Go - Laptop to IDE Hard Drive Adapter at myXtech NSLog(); - Bush the Dolt Newsday......

Excerpt from 0xDECAFBAD at

I concur on the precedence rules that Sam mentions - NewsGator 1.1 will use xhtml:body, then content:encoded, then description, in that order.

Posted by Greg Reinacker at

XHTML in RSS

Interesting and a must have for RssComponents: XHTML in the body. Sam seems to develope a syndirella clone that supports blog posting, so the result may be a similar tool like OExpress/NNTP but for RSS. Should have a closer look how they incorporate...

Excerpt from torstens .NET blog at

I'm with Greg wrt precedence, however, some feeds (mine for example) actually put an abstract into the description that is distinct from the content.

That stated, most people don't care about this subtlety so "let them eat cake!"

Posted by Don Box at

Please reconsider using xhtml:div instead of xhtml:body. If you're going to use existing vocabularies, you're going to have to play by the rules of that vocabulary. Semantics is nice, but this is XML and how elements should be used is described by the schema. xhtml:body only allows block-like elements, like div, p, ul... So if your RSS file contains < xhtml:body > Click < a href="..." > here < /a > < /xhtml:body >, then this element does not validate according to the xhtml schema.

Posted by Sjoerd Visscher at

Personally, I'm ok with only block-like elements within the xhtml:body.  If you want small snippets of text, put it into <description>.  If you want to have rich xhtml markup, then follow the rules and wrap in

or <div> or something.

To me, the restriction is worth it to have more obvious and understandable semantics.

Posted by Greg Reinacker at

I've updated my rss2 feeds to have both a body and an unnamed div.

I'm on the fence on this one.  I agree that usages of body should follow the schema.  I certainly can update the rss validator to enforce this.

If this means that everybody who uses xhtml:body will do what I did and always have both a body and a div, then this seems silly.

Posted by Sam Ruby at

XHTML in rss 2.0

Sam Ruby: xhtml in rss 2.0 I've converted my rss 2.0 feed from &lt;content:encoded&gt; to the to the more bandwidth... [more]

Trackback from Jim Mangan's Weblog

at

Tag du jour: <xhtml:body>. Now what?

...Sam himself updates his own feed to <xhtml:body>, saying it's "more bandwidth friendly" than <content:encoded>, which probably won't be true if all internal tags must also contain the xhtml: prefix, as some argue...... [more]

Trackback from Solipsism Gradient

at

Yes, yes, let's do both.  Let's all do both.  I'd help add to the hellish confusion, but I'm stuck on HTML 4.  Then again, HTML is just a few regular expressions away from XML, right?

Posted by Mark at

I stand by the use of xhtml:body even though short posts will need an innocuous <div> or < p>. I looked at a lot of feeds before pulling the trigger and many, many blog entries are multi-paragraph and naturally have < p > or <div> children anyway.

If people are really torqued about the use of xhtml:body, then we should define a NEW element whose content model is identical to <div> but whose (new) name would convey "this is the content of the damn entry in XHTML 1.0 transitional!!" 

Just slamming a <div> element under item gives me the willies.

Posted by Don Box at

I've now updated my rss2 feeds to only insert a <div> elements when it is necessary to make a valid <xhtml:body>.  As Don points out, in many cases this isn't necessary.

Posted by Sam Ruby at

NewsGator 1.1 Released!

NewsGator 1.1 has been released! This is a significant release...... [more]

Trackback from Greg Reinacker's Weblog

at

RSS 2.0

Sometime soon I'm going to convert my RSS feed to version 2.0. I want to know what the right tags to put my content in are, so I'm keeping track of this bit on XHTML in RSS 2.0 from Sam Ruby. ...

Excerpt from Keith's Weblog at

XML Schema for RSS 2.0

I looked around for an XML Schema definition for RSS 2.0, so I could post some examples and ideas on extending the core specification based on some of the discussions over the weekend through Don Box's and Sam Ruby's weblogs. I was somewhat...

Excerpt from TheArchitect.co.uk - Jorgen Thelin's weblog at

XML Schema for RSS 2.0

I looked around for an XML Schema definition for RSS 2.0, so I could post some examples and ideas on extending the core specification based on some of the discussions over the weekend through Don Box's and Sam Ruby's weblogs. I was somewhat... [more]

Trackback from TheArchitect.co.uk - Jorgen Thelin's weblog

at

Wherefore flyeth baby and bathwater?

Sam Ruby and other notables are replacing the content:encoded elements in their RSS 2.0 feeds with xhtml:body. From the point... [more]

Trackback from Raw Blog

at

I like it.

It may be a while before I support it, as I am currently in the middle of refactoring my implementation of RESTLog to make better use of Cheetah, also adding a unit test suite. Once that work is complete it should be simple to add support to xhtml:body. Then I have to add it to Aggie, then to Pamphlet...Uh-oh, I'm beginning to think I have spread myself too thin. Just a little...

Posted by joe at

Joe - if you are refactoring restlog anyway, have you taken a look at Gary's Burd's mombo?

Many hands make light work...

P.S.  Suggestion: Aggie first.

Posted by Sam Ruby at

I just looked at it again and I like it. Gets me out of the business of maintiaining a whole CMS when I am only really interested in the RESTLog interface. I'll finish off my last changes to RESTLog to have a final release then begin migrating to mombo.

I particularly like the EntryStore abstraction as I get to keep my native file format.

Posted by joe at

Pointers From a Weekend Offline

I spent most of the weekend offline.  Here are a bunch of things that I have been keeping tabs on but haven't had a chance to look into: Cool freshmeat releases Highlight (source hilighter) 2.0b-6.  I've been happy with GNU...

Excerpt from Matt Croydon::postneo at

content -> here

Sam Ruby gamely responded to my comments re. inserting xhtml in RSS : Update your RSS 1.0 feed, and I'll... [more]

Trackback from Raw Blog

at

Ideagraph screenshot

Phil Ringnalda says: I'm much less interested in who will produce the data, and more interested in who will actually... [more]

Trackback from Raw Blog

at

In brief: 1 April 2003

The Register's got a new RSS 1.0 feed, and it validates.  Table-like CSS layouts.... [more]

Trackback from dive into mark

at

To add support for xhtml:body in Aggie RC5, add the following to RssExtractors.xml:

1. A namespace declaration under extractors/namespaces: "<namespace><prefix>xhtml</prefix><uri>http://www.w3.org/1999/xhtml</uri></namespace>"

2. An extractor declaration under extractors/properties: "<property><name>description</name><path>xhtml:body</path><owner>RssItem</owner><variant>All</variant></property>"

That's it.

(Hopefully, Sam, this will not break havoc in your site. I would've posted on my weblog, only Radio went south on me again, and I wanted people to give this a try...)

Posted by Ziv Caspi at

Ziv: I don't know enough about the internals of Aggie to comment, but be careful: the contents of <xhtml:body> is meant to be XML which is to be taken literally, not a string which is to be XML decoded.

For example, strings like "&amp;" should be left as is.

Posted by Sam Ruby at

Ziv,
  I did try it and it doesn't work since
we're extracting the element Text and not the elements InnerXml. Maybe add one more flag to <property> element?

Posted by joe at

.NETWeblogs.com Aggregated Feed Update

.NETWeblogs.com Aggregated Feed Update... [more]

Trackback from ScottW's ASP.NET WebLog

at

xhtml:body for rss2

Seems to be the thing. I've implemented it for now here, but I'm not checking it into CVS yet. It's still hackish, with the extra &#60;div>&#60;/div> wrapped around all entries and fixate_url not being ...

Pingback from DevBlog :: xhtml:body for rss2

at

Sam/Joe: I'm afraid don't understand the comment. The RSS file is a well-formed XML file, so any literal "&amp;" string in it stands for a single ampersand (disregarding the edge-cases for the moment). Are XML vocabularies allowed to override this rule, or am I missing something?

Posted by Ziv Caspi at

Ziv: Compare the content:encoded and xhtml:body elements in my rss 1.0 and rss 2.0 feeds respectively.  In the former, "<" becomes "&lt;".  In the latter, "<" remains as is.

The same thing should be true of "&amp;".  If you see it in the stream it actually represents a single "&" conceptually, but it will need to be encoded back into a "&amp;" when you put it into the output HTML.

Net: what you really want to do is a byte for byte copy of the characters between the <xhtml:body> and </xhtml:body> tags.

Posted by Sam Ruby at

Sam says, "what you really want to do is a byte for byte copy of the characters between the <xhtml:body> and </xhtml:body> tags."

Similarly, one can take the <xhtml:body> fragment and preserve it as a DOM, then write the DOM back out as XML or HTML as needed (which would also correct for where a byte-for-byte copy wouldn't include the necessary namespace declarations).

If one is already using a DOM for the Comment API, xhtml:body is already a node in the tree.  If one is using SAX, you'll have to look for <xhtml:body> specifically and trap all the start/end callbacks to create the DOM, until the closing </xhtml:body>.

Some kind of flag in the markup, "this is literal XML", would make that easier than having every application keep a current list of what fields may be literal XML.

Posted by Ken MacLeod at

Ken - you are correct.  An InfoSet preserving XML to XML transformation is what we are looking for.  In particular, there is a list of items that need not be preserved.

As to the added flag: my opinion is that it doesn't belong in the markup any more than the schema that says that an item has a title, link, and description belongs in the markup.  Beyond the simple mustUnderstand semantics, you know what you are looking for and know the syntax rules for skipping over the rest.

Posted by Sam Ruby at

random

the following are links i found at work today that i'd like to follow up on... but didn't have an email client setup on the machine, so i figured i'd...... [more]

Trackback from bish

at

In brief: 2nd April 2003

Bruce Eckel has a blog: ok, it's old news now, but Bruce Eckel has a blog. Some interesting things in the posts, especially bits from the opening of a not specified Python Conference (or a Python Conference I miss the specification of). And on...

Excerpt from Through the Blogging-glass at

(My humble opinion doesn't weight much, but anyway...)

If the point of using RSS is to deliver content, and/or abstract it from the presentation layer, then WHY OH WHY using &lt;xhtml:whatever> instead of &lt;content:whatever>?

First content abstraction problem: enclosing content in an xhtml namespace tag seems to imply that the content is well-formed XHTML.
Now imagine for example Mark Pilgrim using xhtml:body in his RSS 2.0 feed, when his entries are definitely not going to be well-formed XHTML since his weblog itself is HTML 4.01. Should Mark enclose his entries in html:body to imply that they are well-formed HTML? Should he deliver a version that is converted to XHTML in order to fit in xhtml:body?

Now a second problem: not every source of content is XHTML/HTML. There can possibly be content encoded in Flash (or let's be crazy, MPEG or WAV), that happens to provide an RSS version of its entries. How would xhtml:body relate to the entries there?

--

Am I the only one thinking this is really complicating the matters for a discutable gain in functionality/semantics?

Wouldn't extending the "content:" module  be a simplier task?
After all, wouldn't something along the lines of &lt;content:encoded type="application/xhtml+xml"> carry a better meaning of the content of the entry?
This way, one could post entries in whichever way one likes: Mark could use type="text/html" for example.

--
Excuse the ranting, but really sometimes to the external eye it looks a lot like you (we?) are complicating (y)our RSS feeds just for the heck of it, with not much use outside of pure novelty...

Posted by michel v at

Michel: I guess we have different perspectives on what is simple.

<content:encoded> remains an option for not-well formed content.  It is considerably more verbose, less readable, and less easy to parse, but it still exists.

Posted by Sam Ruby at

Of course, that'll teach me not to post after so many hours of straight work (not!). I'll add the encoded/not-encoded field to Aggie once I complete transforming its RSS-to-object-graph engine (the part that reads RssExtractors) into something more general than just RSS.

Posted by Ziv Caspi at

Again with the relative URLs

Phil is right.  The specs are silent on this.  In fact, many people believe that relative links should be resolved relative to the feed itself, not the &lt;channel&gt;&lt;link&gt; element's value.  Spec issues aside,... [more]

Trackback from Sam Ruby

at

I spent most of today knee deep in RSS, writing an aggregator for a project at work. It has been quickly becomng apparent that "Really Simple Syndication" is anything but! There are currently three major (and goodness knows how many minor) ...

Pingback from Simon Willison: Archive for 4th April 2003

at

Working on RSS fragments with XOM

I am working on a RESTLog client, and specifically on the subsystem responsible for building RSS item fragments representing each weblog entry, and sending them to the server side of the application. As some of you may have noticed, I am using a new...

Excerpt from Through the Blogging-glass at

VerySharpReader

The best curb-appeal of any three-paned Windows RSS aggregator I've seen so far. ...

Excerpt from phil ringnalda dot com at

Ég mæli með RSS 2.0

Ég hvet alla bloggara, og þá sérstaklega Movable Type notendur til að uppfæra RSSið sitt í útgáfu 2.0 Movable Type notendur geta afritað eftirfarandi RSS template kóða: &lt;?xml version="1.0" encoding="&lt;$MTPublishCharset$>"?>... [more]

Trackback from Már Örlygsson

at

Okay, I ‘fess up. I don’t get it.

Why should I bother using xhtml:body when I could just as well be using content:encoded ? What’s the inherent benefit?... [more]

Trackback from phil.wilson

at

How (and why) to include an xhtml:body in a Radio UserLand RSS feed

Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds (Sam, Don) that include a <body> element, properly namespaced as XHTML. Quietly, last week, I joined the party. My primary feed now includes: ......

Excerpt from Jon's Radio at

A cautious welcome

I'm waiting to be convinced about the utility of XHTML-in-RSS, but it's easy enough to play along.... [more]

Trackback from phil ringnalda dot com

at

How (and why) to include an xhtml:body in a Radio UserLand RSS feed. Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds (Sam, Don) that include a <body> element, properly namespaced as XHTML. Quietly, last week, I joined the party....

Excerpt from TIG's Corner at

The logic (or lack thereof) behind xhtml:body

When Sam started a small revolution last month by using &lt;xhtml:body&gt; in his rss-feed, he said this was because it was "more bandwidth and xpath friendly". I can see it's more xpath friendly (though I'm not sure why anyone would want to... [more]

Trackback from public virtual MemoryStream

at

The logic (or lack thereof) behind xhtml:body

The logic (or lack thereof) behind xhtml:body... [more]

Trackback from .NET Blog - Chris Frazier Style

at

Mike's Briefs

There's a good article on ExtremeTech on Krazy Keyboards. I have a great interest in new input devices as......

Excerpt from MikeShea.Net at

The battle for RSS

At SXSW I told Mena Trott that RSS 1.0 was dead or dying, because it was too complicated. Turns out I was partially wrong -- it's very much alive, but perhaps only because it's the default in Movable Type. Six Apart has signed up for the semantic...

Excerpt from Manton Reece at

Synderilla

I posted a new drop of Synderilla that's based on Dimtry's May 9 release, and adds support for gzip/deflate compression, xhtml:body based items and multiple plugins (both IBlogThis and IBlogExtension).... [more]

Trackback from Simon Fell > Its just code

at

Pingback from philweber.net : Phil Weber's Weblog

at

For reference, the necessary RDF tax for using xhtml:body in RDF is to add parseType="Literal":

  <xhtml:body rdf:parseType="Literal">content</xhtml:body>

I don't currently have a feed to show as an example, so hopefully someone else can try it out.

Posted by Ken MacLeod at

Synderilla

I posted a new drop of Synderilla that's based on Dimtry's May 9 release, and adds support for gzip/deflate compression, xhtml:body based items and multiple plugins (both IBlogThis and IBlogExtension). [Simon Fell] Well, here's the beauty of open...

Excerpt from deeje.com at

Logicola Diet ¡Nuevo!

El germen de una idea (y una buena colección de links, de paso) Logicola Diet Un proyecto para aprender y divertirnos, trabajando con tecnologías como PHP, RSS, XSLT y sobre todo en áreas que nos interesan y nos apasionan como el...

Excerpt from logicola at

Replacing RSS

I've been reading up on RSS for the last few weeks, as my sidebar will no doubt show. I'm not...... [more]

Trackback from Virtuelvis

at

Atom news

Both the Feed Validator and my ultra-liberal feed parser have been updated with unrelated changes.... [more]

Trackback from dive into mark

at

Escaped Markup Considered Harmful

While I am certainly sympathetic to that view, my current  leanings are simply that any such escaping is clearly identified as such.... [more]

Trackback from Sam Ruby

at

How (and why) to include an xhtml:body in a Radio UserLand RSS feed

" Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds ( Sam , Don ) that include a... [more]

Trackback from EdTechPost

at

Life with nntp//rss

I've been using nntp//rss for over a month now, and overall, I'm pretty happy with it. I like it for the same reason people who live in Outlook love NewsGator. (I live in Outlook Express, because it lets me handle...... [more]

Trackback from philweber.net

at

Hi,

Any body have sample application to generate and parse the XHTML Documents?

Regards,
chandra

Posted by Chandrasekar at

Sam Ruby: xhtml in rss 2.0

[link]...

Excerpt from del.icio.us/cbc/rss at

I’m a beginer compared to you guys but I have a question thatt I just started searching out an anser for and came across this site. Can href tags and other html tags be put into RSS 2.0 feeds? Do you know of a convineint list of what can go into a feed and what cannot?
Thanks.

Posted by John West at

I was looking for this for my own RSS feed

Found the answer at

RSS at Harvard Law
Syndication technology hosted by the Berkman Center
RSS 2.0 Specification

http blogs.law.harvard.edu/tech/rss

In case someone else come across this site searching for the same type of information.

Thanks - John West

Posted by John West at

XHTML in RSS

DeWitt Clinton: But what if you wanted to put something interesting inside a syndicated content feed? What if you wanted to put valid XHTML in a feed? You went through the trouble of writing XHTML, why should it be flattened to an opaque blob of...

Excerpt from The RSS Blog at

XHTML in RSS

DeWitt Clinton : But what if you wanted to put something interesting......

Excerpt from Real Geek at

Answer by Niko for Display HTML in RSS

I guess it’s possible from this link: [link] . Sounds like glibberish to me, but they seem to succeed in it. Something about . I don’t get it. Update : The W3C says: An item may also be complete in itself, if so,...

Excerpt from Display HTML in RSS - Stack Overflow at

Add your comment