AtomCaching describes techniques for distributed caching and notification of resources, particularly Atom-enabled resources. AtomCaching has possible solutions for the ScalableAtomspace.
Prior Art:
-
BitTorrent -- the IRC Strawman below is similar in principle to BitTorrent
IRC Strawman
The atom-syntax mailing list has an RSS feed hosted by DeveloperDude at http://www.kbcafe.com/iBLOGthere4iM/ListToRss.aspx which links to articles in the mail list archive at http://imc.org/atom-syntax/mail-archive/maillist.html
The mail list archive page updates roughly once an hour, as does the corresponding RSS feed.
This IRC strawman for the atom-syntax list is hosted at irc://irc.freenode.net/atom-syntax
The IRC Strawman "fetch pattern" works like this:
-
Either the feed provider or a seed notifies the IRC channel that the feed has updated
-
Seeds download the feed and any identified changed resources within a given period -- lets say one minute.
-
Clients wait a given period, lets say two minutes, from the time of notification to fetch the feed and resources from a random current seed.
A seed is a host that volunteers to mirror content from the publisher. A client is a typical feed reader/aggregator.
The IRC Strawman "seed pattern" works like this:
-
A seed that volunteers to mirror current resources publishes its availability to the channel once every given period -- say 10 minutes.
-
Clients remember up to 10 or 20 recent seed notices
-
Clients should remove seeds either
-
when receiving a 500 status or repeated 400 statuses (say, three repeats)
-
after a given period, say 40 minutes
-
Seeds should not post an availability until they themselves have mirrored at least the feed and it's immediate resources.
Notices and Mirroring
When a publisher knows or seed notices the feed has updated, it posts a message:
updated http://example.org/feed
Where the URI is the feed URI. URIs are used to allow a channel to host several mirroring sites, and for clients to track sites they are interested in.
When a seed has a populated cache, it should announce its availability by posting:
available http://example.com/mirrors/example.org/ http://example.org/feed
The first URI is the base URI of the mirror. Clients will append a url-encoded version of the requested resource to the base URI of the mirror when requesting a mirrored item.
The second URI is the feed being hosted, which also provides a list of the individual resources being hosted. The second URI may include just the protocol+host portion of the URL, in which case clients can expect that the mirror is mirroring all recent resources from that host (ie. multiple feeds).
Note: it may be better to use HTTP proxying techniques in a "real" implementation, in which case the first URI would be the protocol, host, and port of the proxy and requested URIs would go in the protocol request.
HTTP Proxy Strawman
In the HTTP Proxy technique, caching proxy hosts would notify the publisher that their proxies were available. At first, manually by email, later through some automated means.
The publisher would then include in their feed a <link> element for each proxy, with an attribute rel="cache". Subscribers would initially fetch the feed from the publisher, but then fetch from proxies from that point forward.
<feed>
<link rel="cache"
href="http://example.org:8188/"
title="Caching Proxy" />
<link rel="cache"
href="http://example.com:8000/"
title="Caching Proxy" />
...
