It’s just data

Agile Web 2.0 Development

There is an interesting discussion going on between Tim (Bray) and Tim (O’Reilly) over the use of the term Web 2.0.  I’m with Tim (O’Reilly) in that the term Web 2.0 is as relevant today as the term P2P was in 2001.  And I’m with Tim (Bray) in that the term Web 2.0 will likely be as relevant in 2009 as the term P2P is today.

But I will say that I like the term Web 2.0 much more than I like Tim (O’Reilly’s) previous attempt, namely the Internet Operating System, for reasons I’ll go into at near the end of this mini-essay. 

Note: Casual or non-technical readers may wish to skip ahead entirely to the conclusion.

Case study

The Atom Format Specification was recently declared ready to implement, so it clearly was time to update the feed validator.  With RSS, the tests were organized into three buckets, may, must, and should, based on what the inferred requirements were.  This ultimately turned out to be less than helpful.

With Atom 1.0, a fresh start was possible.  The first thing I did was to scan the spec for explicit RFC 2119 keywords like MUST and SHOULD, and identifying each as a place where a test case was needed.  As I wanted others to participate, I placed this list on a wiki.

After a few rounds of feedback, I needed a checklist so that I could keep track of which tests I had implemented so far.  After a few moments thought, I decided to put a link from each identified test to the test case that implemented that test.  Here is a snapshot from early in that process.

Ultimately, that list has become mostly filled out, and it was worthwhile to add a table of contents.

Eventually, the lists of tests grew.  Atom’s date and email formats are more strict than in prior feed formats.  Atom’s reference to the InfoSet makes explicit a number of assumptions that have built up over time with RSS.  There are proposals which make the handing of whitespace explicit.  Even so, the current number of Atom 1.0 tests is less than half of the ones that were written up for Atom 0.3.  This gap will not only be closed, but surpassed.

This lead to replacing the Apache generated directory listing with a checked in Table of Contents, and with the creation of a header file that links each directory (example) with the associated section in the specification (example), and transforms the icon associated with each line into a hypertext link that launches the Feed Validator application itself.  Additionally, the output of the Feed Validator (example) links to on-line documentation (example).

The header file was the only place where I used any fancy Web 2.0 programming techniques.  Inspired by Mark Pilgrim’s BetterDir and Chris Heilmann’s Unobtrusive Javascript, I wrote a small script which accompanies the Apache generated output which causes the client to dynamically rewrite the page, inserting the links.

Arguably, this approach is a bit fragile.  If JavaScript is turned off, nothing will break, but the links won’t be added.  If I were to switch to a different web server, or even a different version of this web server, again nothing would break, but the links likely won’t be added.  If the formatting of the RFC were to change, my links into the document will likely revert to simply referencing the beginning of the document instead.

In the latter two cases the trade-off between writing an entire application now vs the literally few minutes it would take to modify the script later is one that I am willing to take.

CASE tools

When I worked on federal government contracts in the 1980s, the traceability between specifications, implementation, test-cases, and documentation that I was able to effortlessly achieve here was something that we could only dream about.

A number of vendors produce a number of so-called Computer Aided Software Engineering (CASE) tools that help with this.  In many ways, such tools do so much more than what I have done.  But in many meaningful ways, they also can be said to do so much less.

Let’s enumerate the list of tools I used in this development.  Can you spot the CASE tools?

Internet Operating System

I mentioned above that I did not particularly care for the term Internet Operation System.  Look at the composite application described above.  Where is the Central Processing Unit?  Job scheduler?  Swap partition?

My composite application can be simultaneously executed by a large number of users.  If you clicked on any of the links earlier in this essay, you too may have participated.  And, if you cared to, you could easily bookmark where you left off, enabling you to resume at any time.  All without a job scheduler or swap partition as such.

Instead, this composite application is a distributed state machineSmall Pieces Loosely Joined, and all that.

The underlying fabric of all this is something I explored in Neurotransmitters.  Instead of progressing from small cells (a desktop operating system) to bigger cells (an Internet operating system), you see a progression to multi-cellular networks, where cells exchange data (either viruses or hormones).

Note: to be fair, Tim (O’Reilly) was referring to differing aspects when he coined the term Internet Operating System.  Aspects like identity – something that we are still struggling with today.

Web 2.0

I do agree with Tim (O’Reilly) that we are in the midst of a qualitative shift in our understanding of the Internet, but I see it as the third major iteration.  To help align this, I’ll use the programmer-standard convention of indexing starting with zero, as opposed to the marketing-standard convention of counting starting with one.

Web[0] is exemplified by static home pages, published in broadcast mode.  Many weblogs today continue to operate in this fashion.

Web[1] is exemplified by e-commerce shopping carts, enabling two way interaction between business and consumers.  Comments and Trackbacks are in this category.

Web[2] is characterized by action-at-a-distance interactions and ad hoc integration.  By my putting a link here, your page rank is changed there.  A book I purchase today affects Amazon’s recommendations tomorrow.

I consider AJAX to be “merely” an optimization of an implementation detail of Web 1.0.  On the other hand, I consider the integration of Google and craigslist to be very Web 2.0.  As is GreaseMonkey, Technorati, and even eBay and Wikipedia.

Clearly, some people got this whole “hypermedia as the engine of application state” thing long before everyone else did.