It’s just data

Squeezing the left side of the toothpaste

Mark Baker: Take for example, a blog aggregator. If everybody had their own interface to their blog, aggregation would be an O(N^2) task. But luckily they don't, they all make their RSS available over HTTP GET, a very useful common abstraction. That, plus that RSS (all four versions 8-) is a common format, provides for O(N) integration complexity. Key words in that last sentence are all four versions.

Actually, that isn't so much the problem as all the weblogs out there that provide well formed versions of their content, but no discernable schema. This brings rise to such tools as Cheesegrater and Portalizer

To get a proper handle on this, we need to capture both parts of the interface. And to use applicable standards whenever possible.


Yah, it's true that data format variation increases integration complexity for anything more than trivial no-op integration. But in practice data formats don't vary as much as interfaces do. I'd say we're closer to O(N) than O(NlogN) today.

What we really need to get to pure O(N) is an HTML for machines; a very general format capable of communicating varied semantics for a wide variety of chunks of representational state. Hmm, now where would one find one of these?

And FWIW, I think we can do better than O(N).

Posted by Mark Baker at

Just in terms of tools for aggregating and converting non-RSS feeds into RSS, I'd like to point out the little project I've been throwing together for the last few months:
http://www.davidjanes.com/blogosphere

Posted by David Janes at

Add your comment