As an aside, I find designing caching logic some of the most
interesting work. To properly design a cache, one must
understand the overall solution, desired policies, be able to make
system level trade-offs of performance vs bandwidth vs storage, and
be prepared to pretty much violate every software engineering
principle related to encapsulation there is. The existing
roller
pagecache implementation is no exception. It currently
has to worry about things like language, security, and even knows
which changes affect feeds.
My suggestion is to initially not change any of this, but to
define a new cache (leveraging the existing
LRUcache implementation) specifically for current and prior
versions of feeds. A rough sketch of a design for a filter
follows:
ETag = hash(feed)
setHeader("ETag", ETag)
if not mFeedCache.contains(ETag):
mFeedCache.put(ETag, feed)
if not getHeader("If-None-Match"):
return full feed
if parse("If-None-Match").contains(ETag):
return status 304 (Not Modified)
if getHeader("A-IM"):
if parse(getHeader("A-IM")).contains("feed"):
for (String tag: parse(getHeader("If-None-Match"))):
if feedCache.contains(tag):
setStatus(226)
setHeader("Vary", "If-None-Match")
setHeader("IM", "feed")
getWriter().write(diff(feed,mFeedCache.get(tag),"entry"))
return
FeedDiff
contains suggested implementations for diff, hash, and parse.
Note: none of this logic is particularly Atom specific.
Looking good. But I would think it an opportunity missed if we had separate specifications for entry-oriented diffs/caching for each of the different formats. If ever there was a case for starting from the logical model, this is it. That model probably needs to be nothing more than an ordered list, containing entries from entry[n] to entry[n+1].
The boundaries of each entry would probably have to be defined per format, but your FeedDiff code shows how it could probably be generalised for most 'chunked' XML formats. It would be nice to have arbitrary RDF/XML covered too, but there it's likely to be tricky to figure out the entry delimiters. On that score I mailed a couple of RDF lists (I forgot rss-dev), hoping someone would notice how useful this could be for sync'ing of triplestores (especially when connectivity is intermittent - when the client has to initiate and push won't work).
Danny: Consistency in approach would certainly be a good thing, but there are differences that need to be taken into account. For example, how should the <items> element be handled in RSS 1.0?
While I was enjoying a soccer double header with Alex and Linus playing back to back games, relaxing at Cupajoes, and rocking at a sold out Wilco show, Sam Ruby was doing my work for me. He dove right into the Roller source code and implemented the...
Sam, is there some reason you're not supporting if-modified-since in your various implementations? There are still quite a number of clients that make such requests rather than using If-None-Match and Etags.
For Roller, I was looking for a simple index based on the content. Conceivably, Roller could support multiple feed formats, all having the same modification date, but having different content.
Also, I note that in the Roller implementation there is a significant usage of the synchronized keyword - guarding against simultaneous updates. I'm a bit concerned that a relying on a Last-Modified field with the granularity of a second will result in lost updates.
The rss:items element is only really provided to give the item's server-defined 'natural' order as doc order isn't significant. If the feeddiff approach was used then a first step would be view the set of items as an ordered list, the order of which would follow that of rss:items. So the information is preserved and the simplest solution would be to ditch rss:items altogether.
Ok, there's the question of back-compatibility, but without trying it, it's hard to tell how much breakage in practice this might cause. If there was a lot, then maybe every diff would have to have its own rss:items, but I don't think that would mean any major effort to implement on top of everything else.
Among the breakage would be the fact that Firefox's Live Bookmarks wouldn't see any items for the feed: it, perhaps foolishly, thinks it should treat RDF as RDF, and without the items element there's no connection between a feed and individual items.
Among the breakage would be the fact that Firefox's Live Bookmarks wouldn't see any items for the feed: it, perhaps foolishly, thinks it should treat RDF as RDF
(sigh) Everyone reaches enlightenment at their own pace.
Thanks Phil, I'd forgotten the feed-item association part. There are other ways the association could be expressed, either with per-item properties (which also makes life easier for aggregation/republication) or with the data in a different document. Either approach could treat RDF as RDF. I suspect neither would do much good in the Live Bookmarks case (In fact exactly the same RDF containment semantics could be expressed per-item, though the Seq order that RSS layers on top would have to be expressed some other way).
I'm not sure Live Bookmarks is exactly a reference implementation of anything at present - in the PR it doesn't recognise RSS 2.0 guids so can't make sense of feeds like Winer's and Kottke's.
Hmm, the feed diff code is somehow going to have to deal with the fact XML docs can't have multiple roots either by creating per-diff ones, and otherwise dealing with the de-/re-constitution problem. Nah, on second thoughts it doesn't seem such a big deal, it shouldn't be very difficult to incorporate a per-diff rss:items block inserted in the same part of the code as whatever would be used to reconstitute atom:feed's containment.
I don't know about enlightenment, there still seems to be a lot of fumbling in the dark going on.
'm not sure Live Bookmarks is exactly a reference implementation of anything at present - in the PR it doesn't recognise RSS 2.0 guids so can't make sense of feeds like Winer's and Kottke's.
While noseying around Lambda I spotted this: The Harmony project aims to develop implementation architectures and conceptual foundations for a broad class of synchronizers—programs that reconcile copies of replicated data after disconnected...