FeedDiff for Roller
Yesterday, I had lunch with Dave Johnson. He asked me how hard would it be to add support for the RFC 3229 "feed" instance manipulation method to Roller. I said that I would take a look into it.
As an aside, I find designing caching logic some of the most interesting work. To properly design a cache, one must understand the overall solution, desired policies, be able to make system level trade-offs of performance vs bandwidth vs storage, and be prepared to pretty much violate every software engineering principle related to encapsulation there is. The existing roller pagecache implementation is no exception. It currently has to worry about things like language, security, and even knows which changes affect feeds.
My suggestion is to initially not change any of this, but to define a new cache (leveraging the existing LRUcache implementation) specifically for current and prior versions of feeds. A rough sketch of a design for a filter follows:
ETag = hash(feed)
setHeader("ETag", ETag)
if not mFeedCache.contains(ETag):
mFeedCache.put(ETag, feed)
if not getHeader("If-None-Match"):
return full feed
if parse("If-None-Match").contains(ETag):
return status 304 (Not Modified)
if getHeader("A-IM"):
if parse(getHeader("A-IM")).contains("feed"):
for (String tag: parse(getHeader("If-None-Match"))):
if feedCache.contains(tag):
setStatus(226)
setHeader("Vary", "If-None-Match")
setHeader("IM", "feed")
getWriter().write(diff(feed,mFeedCache.get(tag),"entry"))
return
FeedDiff contains suggested implementations for diff, hash, and parse. Note: none of this logic is particularly Atom specific.
Tim Bray has said that he would write a bloody Internet Draft myself for the Atom WG if nobody else does. Perhaps somebody in the rss-dev working group or RSS Advisory board would consider doing likewise for these feed formats?
Danny: Consistency in approach would certainly be a good thing, but there are differences that need to be taken into account. For example, how should the <items> element be handled in RSS 1.0?
Posted by Sam Ruby at
Thanks Sam!
While I was enjoying a soccer double header with Alex and Linus playing back to back games, relaxing at Cupajoes, and rocking at a sold out Wilco show, Sam Ruby was doing my work for me. He dove right into the Roller source code and implemented the...Excerpt from Blogging Roller at
Sam, is there some reason you're not supporting if-modified-since in your various implementations? There are still quite a number of clients that make such requests rather than using If-None-Match and Etags.
bob wyman
Posted by Bob Wyman atBob,
For Roller, I was looking for a simple index based on the content. Conceivably, Roller could support multiple feed formats, all having the same modification date, but having different content.
Also, I note that in the Roller implementation there is a significant usage of the synchronized keyword - guarding against simultaneous updates. I'm a bit concerned that a relying on a Last-Modified field with the granularity of a second will result in lost updates.
The rss:items element is only really provided to give the item's server-defined 'natural' order as doc order isn't significant. If the feeddiff approach was used then a first step would be view the set of items as an ordered list, the order of which would follow that of rss:items. So the information is preserved and the simplest solution would be to ditch rss:items altogether.
Ok, there's the question of back-compatibility, but without trying it, it's hard to tell how much breakage in practice this might cause. If there was a lot, then maybe every diff would have to have its own rss:items, but I don't think that would mean any major effort to implement on top of everything else.
Posted by Danny at
Among the breakage would be the fact that Firefox's Live Bookmarks wouldn't see any items for the feed: it, perhaps foolishly, thinks it should treat RDF as RDF, and without the items element there's no connection between a feed and individual items.
Posted by Phil Ringnalda at
Among the breakage would be the fact that Firefox's Live Bookmarks wouldn't see any items for the feed: it, perhaps foolishly, thinks it should treat RDF as RDF
(sigh) Everyone reaches enlightenment at their own pace.
Posted by Mark atThanks Phil, I'd forgotten the feed-item association part. There are other ways the association could be expressed, either with per-item properties (which also makes life easier for aggregation/republication) or with the data in a different document. Either approach could treat RDF as RDF. I suspect neither would do much good in the Live Bookmarks case (In fact exactly the same RDF containment semantics could be expressed per-item, though the Seq order that RSS layers on top would have to be expressed some other way).
I'm not sure Live Bookmarks is exactly a reference implementation of anything at present - in the PR it doesn't recognise RSS 2.0 guids so can't make sense of feeds like Winer's and Kottke's.
Hmm, the feed diff code is somehow going to have to deal with the fact XML docs can't have multiple roots either by creating per-diff ones, and otherwise dealing with the de-/re-constitution problem. Nah, on second thoughts it doesn't seem such a big deal, it shouldn't be very difficult to incorporate a per-diff rss:items block inserted in the same part of the code as whatever would be used to reconstitute atom:feed's containment.
I don't know about enlightenment, there still seems to be a lot of fumbling in the dark going on.
Posted by Danny at'm not sure Live Bookmarks is exactly a reference implementation of anything at present - in the PR it doesn't recognise RSS 2.0 guids so can't make sense of feeds like Winer's and Kottke's.
Sam fixed that bug already: [link]
Posted by Mark atHarmony
While noseying around Lambda I spotted this: The Harmony project aims to develop implementation architectures and conceptual foundations for a broad class of synchronizers—programs that reconcile copies of replicated data after disconnected...Excerpt from Planet RDF at
Looking good. But I would think it an opportunity missed if we had separate specifications for entry-oriented diffs/caching for each of the different formats. If ever there was a case for starting from the logical model, this is it. That model probably needs to be nothing more than an ordered list, containing entries from entry[n] to entry[n+1].
The boundaries of each entry would probably have to be defined per format, but your FeedDiff code shows how it could probably be generalised for most 'chunked' XML formats. It would be nice to have arbitrary RDF/XML covered too, but there it's likely to be tricky to figure out the entry delimiters. On that score I mailed a couple of RDF lists (I forgot rss-dev), hoping someone would notice how useful this could be for sync'ing of triplestores (especially when connectivity is intermittent - when the client has to initiate and push won't work).
Exciting stuff anyhow.
Posted by Danny at