UserPreferences

ScalableAtomspace


Scaling Atom: P2P and Cached Feeds.

  1. Scaling Atom: P2P and Cached Feeds.
    1. Proposals
    2. The syndicated blogosphere will reach 300 million feeds in 3 years.
    3. Feed payloads will grow 100 to 10,000 times.
    4. Each reader may consume 1000 feeds.
    5. Syndication Growth = Denial Of Service
    6. Discussion

Proposals

The syndicated blogosphere will reach 300 million feeds in 3 years.

This is very early in the adoption of RSS feeds. Very few publishers. Even fewer readers. How will this change?

Assume growth.

In two years:

  1. Every blogger will publish a main feed.

  2. Each blog's category or topic will have a mirror feed.

  3. Every business system requiring a user ID will customize feeds for each user.

  4. Every major media outlet will drive traffic and affilliation by publishing feeds.

  5. Some consumers will add editorial value by blending existing feeds into new, focused feeds.

I assume AOL, Microsoft, Yahoo!, and Terra will turn on blogging tools in the next 18 months, and 10% of the online community (70 million people) become bloggers.

So, many feeds.

Feed payloads will grow 100 to 10,000 times.

TiVo users often record more programming than they can possibly watch (assuming employment and sleep). This assures freedom and choice. There is every reason to believe that newsreader users will behave likewise.

Which brings us to bandwidth...

A picture is worth a thousand words. Literally.

If so, what is audio and video? Moblogging and photoblogs will exacerbate this.

In bandwidth terms, text is nearly free over land lines. Images, sounds, and video will comprise a growing share of bandwidth costs.

Each reader may consume 1000 feeds.

We'll also grow in our ability to read them.

Newsreaders will help us filter and prioritize our reading.

So our capacity to follow more feeds will also grow by at least one to two orders of magnitude. Most people follow under 100 feeds in their newsreaders now. I follow nearly 1000, 50 religiously, 200 regularly. But all of them are searchable on my hard drive and they all pop-up in a balloon when they update.

And we don't have useful filters now. When the tools start to do more, the number of feeds consumed per reader will grow.

Syndication Growth = Denial Of Service

Lets assume that for every blogger there are two non-bloggers reading. That puts feed readership around 200 million. So you have 200 million people reading probing a thousand feeds an hour for updates. That's 200 billion probes an hour. Don't get me started on how many terabytes of flow that represents.

Do people only probe hourly? How often would you update for the latest scores during the World Cup or SuperBowl? For election results? For your medical report? Some fraction of services must support updates at closer intervals.

What architectures will support this scale? Peer-To-Peer (P2P) and caching by intermediaries (communal aggregators) have helped others to scale. Both add delays to distribution while absorbing publisher bandwidth costs and connections. There is no reason why we shouldn't apply both architectures to this problem.

So, I ask the Echo community, What changes if:

  1. The physical location of the feed is not the feed's original source?

  2. A client must choose from among multiple sources of the same feed?

  3. The publisher, while not abdicating a feed's authoritative url, wishes to redirect some or all consumers to any of a list of alternative locations? (Think mirrored downloads or bitstream)

  4. The copy of a feed file has been passed on ten times before you receive it?

P.S. I welcome challenges to my assumption, estimates, and conclusions. While I'm pretty confident in the shape of this analysis, details matter.

Discussion

[AdamRice] Quite an interesting bundle of ideas to chew on.

[MichaelManley RefactorOk] Perhaps mirroring Atomic Feeds is an opportunity to start establishing a PKI infrastructure. If the managing editor or some other authority sign the feed as a whole, that could open up possibilities for mirroring of feeds without fear of the feeds themselves being compromised. On the initial subscription to a feed, the aggregator would connect to the feed originator and get the public key of the keypair used to sign the feed. The aggregator could pick up the feed from any mirror (or other distribution mechanism) and be reasonably assured that the feed had not been tampered with since the original publication by verifying the signature. Mirroring feeds with authentication, alongside whatever caching mechanism the transport provides, could mitigate bandwidth concerns for popular feeds (thinking of feeds distributed via bittorrent, for example). Also, should pointers to public keys be made part of the AutoDiscovery mechanism?

[TomasJogin] To me, this sounds like a case of trying to solve problems that would be cool to have in a distant future. The web has been growing with an incredible speed for five to ten years now. As the selection of weblogs and webpages increases, the smaller the chance is that someone is going to handpick your weblog as one they subscribe to.

[AdamRice] Yeah, Tomas is right. In the meantime, [WWW]Shrook is interesting.

[AsbjornUlsberg] What about providing an HTTP PUSH method for aggregators who don't want to be polled every n'th second, and where authors (and not readers) want to have control over when their article should be updated all around the world? PUSH requires subscription and some processes around this, but that will probably be required in a lot of cases and should be defined anyhow.


Original Author: PhilWolff in [WWW]Scaling Echo

CategoryArchitecture, CategoryModel, CategoryApi