xhtml in rss 2.0
Don uncaves. ;-)
Accordingly, I've converted my rss 2.0 feed from <content:encoded> to the to the more bandwidth and xpath friendly <xhtml:body>. It looks like gotdotnet and blogx users will soon follow. Hopefully the owners of the wellformedweb and w3future weblogs will take notice.
The updated feed is valid, and it uses namespaces in exactly the way that rss 2.0 and xhtml intend. I've tested it with radio and syndirella.
The choice of xhtml:body was intentional. The goal was to convey that "this is the content - no need to HTTP GET another entity unless you're looking for fancy styling."
Just blasting an xhtml:div into an item seemed less explicit as to why it was there.
If y'all would we happy coining yet another element name (e.g., realcontent) in some new namespace, then great. Otherwise, I think xhtml:body is the right choice.
Posted by Don Box atI've converted my rss 2.0 feed from <content:encoded> to the more bandwidth and xpath friendly <xhtml:body>. It looks like gotdotnet and blogx users will soon follow. Hopefully the owners of the wellformedweb and w3future weblogs will...
Excerpt from Sjoerd Visscher's weblog at
I'm using Syndirella, but it doesn't seem to be displaying exactly right.
" I've converted my rss 2.0 feed from <content:encoded> to the to the more bandwidth and xpath friendly <xhtml:body>."
when viewed on the web became
"I've converted my rss 2.0 feed from to the to the more bandwidth and xpath friendly ."
in my copy of Syndirella.
Note the loss of anything in <>.
Posted by tamaracks attamaracks: thanks!
There are two things going on here. First is that syndirella doesn't (yet) know how to handle xhtml:body. Second, it is falling back to the description which I had not properly encoded.
Now fixed! Thanks for noticing!
Posted by Sam Ruby atRSS and the RESTian Dilema
There we where thinking we had some agreement on the RSS format and had achieved some convergence and stability at the 2.0 level, then this happens! [more]Trackback from TheArchitect.co.uk - Jorgen Thelin's weblog at
Personally, I don't disagree with this change, but it does raise some interesting questions around whole-document versioning and format standardization.
Posted by Jorgen Thelin at
My policy with parsing RSS has always been to parse everything, and then run through whatever I got looking for things I recognize in descending order of preference ("is there a dc:date? that's my date. no? is there a pubdate?..."), rather than saying "this is RSS 2, so unless there's a pubdate there's no date at all", so I don't see any real problem with saying "is there an xhtml:body? how about content:encoded? oh well, I'll take description."
But the xpath users better come up with some fun and pretty implementations, to make up for the hassle I'm going through changing all my sax parsers to do something completely different when I see start and end tags while xhtml:body is open. It may be just code, but quite a lot of it is just code I didn't write, and only half understand.
Posted by Phil Ringnalda atI agree with Don on the xhtml:body vs. xhtml:div thing. Using the body tag, IMHO, conveys more semantic meaning than using a div in the middle of nowhere.
Posted by Greg Reinacker at
BlogX 1.0 is updated... i'll post new bits in the morning, but http://www.simplegeek.com is publishing xhtml:body in my RSS feed
Posted by Chris Anderson at
XHTML in RSS... for BlogX
<xhtml:body> support in RSS... thanks to Don and Sam for the proding... Expect a public rev 20 of BlogX with this feature tomorrow... ...Excerpt from simplegeek at
Don, you have <body> tags in your <description> elements. That doesn't seem right.
Furthermore, I thought the whole point of RSS was simple syndication. Having description, content:encoded or xhtml:body tags, doesn't make things simpler. And there aren't any precedence rules either.
(BTW, I like the spellcheck;)
Posted by Breyten atBreyten: Don already said that he would try to get his descriptions properly encoded by Monday. There problem with description is that there is no documentation as to what proper encoding is. Some people don't encode it, others encode it once, and some encode it multiple times.
content:encoded is better documented, but it renders the structure of the comment opaque. Much of the what makes the web work is that every bit of structure that people can tolerate putting into their data is fully exposed.
xhtml:body is a step forward. But it isn't for everyone. In particular, it won't work if the content is not well-formed XML.
As for precedence rules, these will emerge. RSS is not controlled by the W3C and these elements were proposed and implemented by different people. In this case, I think the order is clear: xhtml:body then content:encoded then description.
Posted by Sam Ruby atQuick links
Cables To Go - Laptop to IDE Hard Drive Adapter at myXtech NSLog(); - Bush the Dolt Newsday......Excerpt from 0xDECAFBAD at
I concur on the precedence rules that Sam mentions - NewsGator 1.1 will use xhtml:body, then content:encoded, then description, in that order.
Posted by Greg Reinacker at
XHTML in RSS
Interesting and a must have for RssComponents: XHTML in the body. Sam seems to develope a syndirella clone that supports blog posting, so the result may be a similar tool like OExpress/NNTP but for RSS. Should have a closer look how they incorporate...Excerpt from torstens .NET blog at
I'm with Greg wrt precedence, however, some feeds (mine for example) actually put an abstract into the description that is distinct from the content.
That stated, most people don't care about this subtlety so "let them eat cake!"
Posted by Don Box atPlease reconsider using xhtml:div instead of xhtml:body. If you're going to use existing vocabularies, you're going to have to play by the rules of that vocabulary. Semantics is nice, but this is XML and how elements should be used is described by the schema. xhtml:body only allows block-like elements, like div, p, ul... So if your RSS file contains < xhtml:body > Click < a href="..." > here < /a > < /xhtml:body >, then this element does not validate according to the xhtml schema.
Posted by Sjoerd Visscher at
Personally, I'm ok with only block-like elements within the xhtml:body. If you want small snippets of text, put it into <description>. If you want to have rich xhtml markup, then follow the rules and wrap in
or <div> or something.
To me, the restriction is worth it to have more obvious and understandable semantics.
Posted by Greg Reinacker atI've updated my rss2 feeds to have both a body and an unnamed div.
I'm on the fence on this one. I agree that usages of body should follow the schema. I certainly can update the rss validator to enforce this.
If this means that everybody who uses xhtml:body will do what I did and always have both a body and a div, then this seems silly.
Posted by Sam Ruby atXHTML in rss 2.0
Sam Ruby: xhtml in rss 2.0 I've converted my rss 2.0 feed from <content:encoded> to the to the more bandwidth... [more]Trackback from Jim Mangan's Weblog at
Tag du jour: <xhtml:body>. Now what?
...Sam himself updates his own feed to <xhtml:body>, saying it's "more bandwidth friendly" than <content:encoded>, which probably won't be true if all internal tags must also contain the xhtml: prefix, as some argue...... [more]Trackback from Solipsism Gradient at
Yes, yes, let's do both. Let's all do both. I'd help add to the hellish confusion, but I'm stuck on HTML 4. Then again, HTML is just a few regular expressions away from XML, right?
Posted by Mark at
I stand by the use of xhtml:body even though short posts will need an innocuous <div> or < p>. I looked at a lot of feeds before pulling the trigger and many, many blog entries are multi-paragraph and naturally have < p > or <div> children anyway.
If people are really torqued about the use of xhtml:body, then we should define a NEW element whose content model is identical to <div> but whose (new) name would convey "this is the content of the damn entry in XHTML 1.0 transitional!!"
Just slamming a <div> element under item gives me the willies.
Posted by Don Box atI've now updated my rss2 feeds to only insert a <div> elements when it is necessary to make a valid <xhtml:body>. As Don points out, in many cases this isn't necessary.
Posted by Sam Ruby at
NewsGator 1.1 Released!
NewsGator 1.1 has been released! This is a significant release...... [more]Trackback from Greg Reinacker's Weblog at
RSS 2.0
Sometime soon I'm going to convert my RSS feed to version 2.0. I want to know what the right tags to put my content in are, so I'm keeping track of this bit on XHTML in RSS 2.0 from Sam Ruby. ...Excerpt from Keith's Weblog at
XML Schema for RSS 2.0
I looked around for an XML Schema definition for RSS 2.0, so I could post some examples and ideas on extending the core specification based on some of the discussions over the weekend through Don Box's and Sam Ruby's weblogs. I was somewhat...Excerpt from TheArchitect.co.uk - Jorgen Thelin's weblog at
XML Schema for RSS 2.0
I looked around for an XML Schema definition for RSS 2.0, so I could post some examples and ideas on extending the core specification based on some of the discussions over the weekend through Don Box's and Sam Ruby's weblogs. I was somewhat... [more]Trackback from TheArchitect.co.uk - Jorgen Thelin's weblog at
Wherefore flyeth baby and bathwater?
Sam Ruby and other notables are replacing the content:encoded elements in their RSS 2.0 feeds with xhtml:body. From the point... [more]Trackback from Raw Blog at
I like it.
It may be a while before I support it, as I am currently in the middle of refactoring my implementation of RESTLog to make better use of Cheetah, also adding a unit test suite. Once that work is complete it should be simple to add support to xhtml:body. Then I have to add it to Aggie, then to Pamphlet...Uh-oh, I'm beginning to think I have spread myself too thin. Just a little...
Posted by joe atJoe - if you are refactoring restlog anyway, have you taken a look at Gary's Burd's mombo?
Many hands make light work...
P.S. Suggestion: Aggie first.
Posted by Sam Ruby atI just looked at it again and I like it. Gets me out of the business of maintiaining a whole CMS when I am only really interested in the RESTLog interface. I'll finish off my last changes to RESTLog to have a final release then begin migrating to mombo.
I particularly like the EntryStore abstraction as I get to keep my native file format.
Posted by joe atPointers From a Weekend Offline
I spent most of the weekend offline. Here are a bunch of things that I have been keeping tabs on but haven't had a chance to look into: Cool freshmeat releases Highlight (source hilighter) 2.0b-6. I've been happy with GNU...Excerpt from Matt Croydon::postneo at
content -> here
Sam Ruby gamely responded to my comments re. inserting xhtml in RSS : Update your RSS 1.0 feed, and I'll... [more]Trackback from Raw Blog at
Ideagraph screenshot
Phil Ringnalda says: I'm much less interested in who will produce the data, and more interested in who will actually... [more]Trackback from Raw Blog at
In brief: 1 April 2003
The Register's got a new RSS 1.0 feed, and it validates. Table-like CSS layouts.... [more]Trackback from dive into mark at
To add support for xhtml:body in Aggie RC5, add the following to RssExtractors.xml:
1. A namespace declaration under extractors/namespaces: "<namespace><prefix>xhtml</prefix><uri>http://www.w3.org/1999/xhtml</uri></namespace>"
2. An extractor declaration under extractors/properties: "<property><name>description</name><path>xhtml:body</path><owner>RssItem</owner><variant>All</variant></property>"
That's it.
(Hopefully, Sam, this will not break havoc in your site. I would've posted on my weblog, only Radio went south on me again, and I wanted people to give this a try...)
Posted by Ziv Caspi atZiv: I don't know enough about the internals of Aggie to comment, but be careful: the contents of <xhtml:body> is meant to be XML which is to be taken literally, not a string which is to be XML decoded.
For example, strings like "&" should be left as is.
Posted by Sam Ruby atZiv,
I did try it and it doesn't work since
we're extracting the element Text and not the elements InnerXml. Maybe add one more flag to <property> element?
Posted by joe at
.NETWeblogs.com Aggregated Feed Update
.NETWeblogs.com Aggregated Feed Update... [more]Trackback from ScottW's ASP.NET WebLog at
xhtml:body for rss2
Seems to be the thing. I've implemented it for now here, but I'm not checking it into CVS yet. It's still hackish, with the extra <div></div> wrapped around all entries and fixate_url not being ...Pingback from DevBlog :: xhtml:body for rss2 at
Sam/Joe: I'm afraid don't understand the comment. The RSS file is a well-formed XML file, so any literal "&" string in it stands for a single ampersand (disregarding the edge-cases for the moment). Are XML vocabularies allowed to override this rule, or am I missing something?
Posted by Ziv Caspi at
Ziv: Compare the content:encoded and xhtml:body elements in my rss 1.0 and rss 2.0 feeds respectively. In the former, "<" becomes "<". In the latter, "<" remains as is.
The same thing should be true of "&". If you see it in the stream it actually represents a single "&" conceptually, but it will need to be encoded back into a "&" when you put it into the output HTML.
Net: what you really want to do is a byte for byte copy of the characters between the <xhtml:body> and </xhtml:body> tags.
Posted by Sam Ruby atSam says, "what you really want to do is a byte for byte copy of the characters between the <xhtml:body> and </xhtml:body> tags."
Similarly, one can take the <xhtml:body> fragment and preserve it as a DOM, then write the DOM back out as XML or HTML as needed (which would also correct for where a byte-for-byte copy wouldn't include the necessary namespace declarations).
If one is already using a DOM for the Comment API, xhtml:body is already a node in the tree. If one is using SAX, you'll have to look for <xhtml:body> specifically and trap all the start/end callbacks to create the DOM, until the closing </xhtml:body>.
Some kind of flag in the markup, "this is literal XML", would make that easier than having every application keep a current list of what fields may be literal XML.
Posted by Ken MacLeod atKen - you are correct. An InfoSet preserving XML to XML transformation is what we are looking for. In particular, there is a list of items that need not be preserved.
As to the added flag: my opinion is that it doesn't belong in the markup any more than the schema that says that an item has a title, link, and description belongs in the markup. Beyond the simple mustUnderstand semantics, you know what you are looking for and know the syntax rules for skipping over the rest.
Posted by Sam Ruby atrandom
the following are links i found at work today that i'd like to follow up on... but didn't have an email client setup on the machine, so i figured i'd...... [more]Trackback from bish at
In brief: 2nd April 2003
Bruce Eckel has a blog: ok, it's old news now, but Bruce Eckel has a blog. Some interesting things in the posts, especially bits from the opening of a not specified Python Conference (or a Python Conference I miss the specification of). And on...Excerpt from Through the Blogging-glass at
(My humble opinion doesn't weight much, but anyway...)
If the point of using RSS is to deliver content, and/or abstract it from the presentation layer, then WHY OH WHY using <xhtml:whatever> instead of <content:whatever>?
First content abstraction problem: enclosing content in an xhtml namespace tag seems to imply that the content is well-formed XHTML.
Now imagine for example Mark Pilgrim using xhtml:body in his RSS 2.0 feed, when his entries are definitely not going to be well-formed XHTML since his weblog itself is HTML 4.01. Should Mark enclose his entries in html:body to imply that they are well-formed HTML? Should he deliver a version that is converted to XHTML in order to fit in xhtml:body?
Now a second problem: not every source of content is XHTML/HTML. There can possibly be content encoded in Flash (or let's be crazy, MPEG or WAV), that happens to provide an RSS version of its entries. How would xhtml:body relate to the entries there?
--
Am I the only one thinking this is really complicating the matters for a discutable gain in functionality/semantics?
Wouldn't extending the "content:" module be a simplier task?
After all, wouldn't something along the lines of <content:encoded type="application/xhtml+xml"> carry a better meaning of the content of the entry?
This way, one could post entries in whichever way one likes: Mark could use type="text/html" for example.
--
Excuse the ranting, but really sometimes to the external eye it looks a lot like you (we?) are complicating (y)our RSS feeds just for the heck of it, with not much use outside of pure novelty...
Michel: I guess we have different perspectives on what is simple.
<content:encoded> remains an option for not-well formed content. It is considerably more verbose, less readable, and less easy to parse, but it still exists.
Posted by Sam Ruby atOf course, that'll teach me not to post after so many hours of straight work (not!). I'll add the encoded/not-encoded field to Aggie once I complete transforming its RSS-to-object-graph engine (the part that reads RssExtractors) into something more general than just RSS.
Posted by Ziv Caspi at
Again with the relative URLs
Phil is right. The specs are silent on this. In fact, many people believe that relative links should be resolved relative to the feed itself, not the <channel><link> element's value. Spec issues aside,... [more]Trackback from Sam Ruby at
I spent most of today knee deep in RSS, writing an aggregator for a project at work. It has been quickly becomng apparent that "Really Simple Syndication" is anything but! There are currently three major (and goodness knows how many minor) ...
Pingback from Simon Willison: Archive for 4th April 2003 at
Working on RSS fragments with XOM
I am working on a RESTLog client, and specifically on the subsystem responsible for building RSS item fragments representing each weblog entry, and sending them to the server side of the application. As some of you may have noticed, I am using a new...Excerpt from Through the Blogging-glass at
VerySharpReader
The best curb-appeal of any three-paned Windows RSS aggregator I've seen so far. ...Excerpt from phil ringnalda dot com at
Ég mæli með RSS 2.0
Ég hvet alla bloggara, og þá sérstaklega Movable Type notendur til að uppfæra RSSið sitt í útgáfu 2.0 Movable Type notendur geta afritað eftirfarandi RSS template kóða: <?xml version="1.0" encoding="<$MTPublishCharset$>"?>... [more]Trackback from Már Örlygsson at
Okay, I ‘fess up. I don’t get it.
Why should I bother using xhtml:body when I could just as well be using content:encoded ? What’s the inherent benefit?... [more]Trackback from phil.wilson at
How (and why) to include an xhtml:body in a Radio UserLand RSS feed
Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds (Sam, Don) that include a <body> element, properly namespaced as XHTML. Quietly, last week, I joined the party. My primary feed now includes: ......Excerpt from Jon's Radio at
A cautious welcome
I'm waiting to be convinced about the utility of XHTML-in-RSS, but it's easy enough to play along.... [more]Trackback from phil ringnalda dot com at
How (and why) to include an xhtml:body in a Radio UserLand RSS feed. Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds (Sam, Don) that include a <body> element, properly namespaced as XHTML. Quietly, last week, I joined the party....
Excerpt from TIG's Corner at
The logic (or lack thereof) behind xhtml:body
When Sam started a small revolution last month by using <xhtml:body> in his rss-feed, he said this was because it was "more bandwidth and xpath friendly". I can see it's more xpath friendly (though I'm not sure why anyone would want to... [more]Trackback from public virtual MemoryStream at
The logic (or lack thereof) behind xhtml:body
The logic (or lack thereof) behind xhtml:body... [more]Trackback from .NET Blog - Chris Frazier Style at
Mike's Briefs
There's a good article on ExtremeTech on Krazy Keyboards. I have a great interest in new input devices as......Excerpt from MikeShea.Net at
The battle for RSS
At SXSW I told Mena Trott that RSS 1.0 was dead or dying, because it was too complicated. Turns out I was partially wrong -- it's very much alive, but perhaps only because it's the default in Movable Type. Six Apart has signed up for the semantic...Excerpt from Manton Reece at
Synderilla
I posted a new drop of Synderilla that's based on Dimtry's May 9 release, and adds support for gzip/deflate compression, xhtml:body based items and multiple plugins (both IBlogThis and IBlogExtension).... [more]Trackback from Simon Fell > Its just code at
For reference, the necessary RDF tax for using xhtml:body in RDF is to add parseType="Literal":
<xhtml:body rdf:parseType="Literal">content</xhtml:body>
I don't currently have a feed to show as an example, so hopefully someone else can try it out.
Posted by Ken MacLeod atSynderilla
I posted a new drop of Synderilla that's based on Dimtry's May 9 release, and adds support for gzip/deflate compression, xhtml:body based items and multiple plugins (both IBlogThis and IBlogExtension). [Simon Fell] Well, here's the beauty of open...Excerpt from deeje.com at
Logicola Diet ¡Nuevo!
El germen de una idea (y una buena colección de links, de paso) Logicola Diet Un proyecto para aprender y divertirnos, trabajando con tecnologías como PHP, RSS, XSLT y sobre todo en áreas que nos interesan y nos apasionan como el...Excerpt from logicola at
Replacing RSS
I've been reading up on RSS for the last few weeks, as my sidebar will no doubt show. I'm not...... [more]Trackback from Virtuelvis at
Atom news
Both the Feed Validator and my ultra-liberal feed parser have been updated with unrelated changes.... [more]Trackback from dive into mark at
Escaped Markup Considered Harmful
While I am certainly sympathetic to that view, my current leanings are simply that any such escaping is clearly identified as such.... [more]Trackback from Sam Ruby at
How (and why) to include an xhtml:body in a Radio UserLand RSS feed
" Sam Ruby and Don Box have both demonstrated valid RSS 2.0 feeds ( Sam , Don ) that include a... [more]Trackback from EdTechPost at
Life with nntp//rss
I've been using nntp//rss for over a month now, and overall, I'm pretty happy with it. I like it for the same reason people who live in Outlook love NewsGator. (I live in Outlook Express, because it lets me handle...... [more]Trackback from philweber.net at
Hi,
Any body have sample application to generate and parse the XHTML Documents?
Regards,
chandra
I’m a beginer compared to you guys but I have a question thatt I just started searching out an anser for and came across this site. Can href tags and other html tags be put into RSS 2.0 feeds? Do you know of a convineint list of what can go into a feed and what cannot?
Thanks.
Posted by John West at
I was looking for this for my own RSS feed
Found the answer at
RSS at Harvard Law
Syndication technology hosted by the Berkman Center
RSS 2.0 Specification
http blogs.law.harvard.edu/tech/rss
In case someone else come across this site searching for the same type of information.
Thanks - John West
Posted by John West atXHTML in RSS
DeWitt Clinton: But what if you wanted to put something interesting inside a syndicated content feed? What if you wanted to put valid XHTML in a feed? You went through the trouble of writing XHTML, why should it be flattened to an opaque blob of...Excerpt from The RSS Blog at
XHTML in RSS
DeWitt Clinton : But what if you wanted to put something interesting......Excerpt from Real Geek at
Sjoerd Visscher approves, but suggests <xhtml:div> instead. Any objections?
Posted by Sam Ruby at