By Sam Ruby, April 5, 2002.
The essay explores future directions in distributed computing.
A while back, Dr. Steve Burbeck published The Tao of e-business services. Truth be told, I didn't fully appreciate it then. In fact, most of the pieces didn't make sense to me until I saw Steve present at last year's O'Reilly P2P and Web Services Conference. The following captures the key point that I took away from all of this.
If you sort biological organisms by size, you will see a point at which the strategy shifts from making larger cells to making more cells. Cells are surrounded by a trust boundary. Cells communicate by two basic mechanisms.
In the first mechanism, the sender determines the action to be taken. The request penetrates the membrane and then employs the machinery within the cell to execute per the instructions contained within the message.
In the second mechanism, the receiver determines the action to be taken. The request matches a receptacle on the membrane and the cell uses this information to trigger biological processes.
In biological terms, the agent of the first mechanism is called a virus, and the agent of the second mechanism is called a hormone. In larger organisms, there is a strong preference for the second mechanism.
If you are reading this text, then you are experiencing a form of publish / subscribe. The sender of the message is unaware of who the recipient is, or even how the message itself gets from point a to point b.
If you subscribe to rss feeds, then the connection is even more indirect. A document is generated. It contains no verbs, merely data. That data is placed on a server, and other servers may syndicate its content. You may be subscribed to a different server than the one upon which the data was originally placed. What you do with this data once you receive it is up to you.
It is worth mentioning at this point that the characteristics of the overall flow that is achieved may be quite different than the characteristics of the underlying mechanisms used to achieve that result. As an example, every step of the way, the processing can be highly deterministic, synchronous, and request/response. But the end result is that the client doesn't know where the ultimate destination is or even when it might get there.
No REST for the Weary
Dr. Roy T. Fielding introduced the concept of REpresentational State Transfer (REST). This has been the subject of endless debate over countless mailing lists. Again, I don't feel qualified to even appreciate all of the arguments, but here are some of the key points.
Architectures based on REST have a small number of verbs (like, four) which apply to an inexhaustible supply of nouns (every possible URL). Much of the discussion to date has been to contrasts this model to traditional RPC models which have an inexhaustible supply of both verbs and nouns.
More interesting to me is the following statement (in section 5.2.2) "All REST interactions are stateless. That is, each request contains all of the information necessary for a connector to understand the request, independent of any requests that may have preceded it.". While Roy is known to be skeptical about the overuse of XML, he states that this constraint is independent of the protocol syntax. Not that this stops any of the seemingly endless debate, of course.
The idea of messages being entirely self contained and routed
across great distances to be interpreted by the recipient is the
recurring theme that I want to highlight here.
Another influential effort in this area has been the result of Yale university's Linda group. The overall concept is one of a shared space into which processes can place messages from which other processes can access them, optionally removing them in the process and optionally waiting. Again the concept is that there are a small number of verbs (six or less) which can apply to any data.
In the Linda coordination language, the sender of a message does not know anything about the recipients. The only communication is through the data itself. Recipients are determined by matches on content. Data is a simple sequence of strings and numbers. IBM TSpaces and Sun JavaSpaces continue this work, focusing on such aspects as security and richer objects respectively.
More recently XMLSpaces have been introduced. They also introduce a richer set of constructs, in a platform independent way. They also allow the leveraging of standards like digital signatures and XML Query. RogueWave has a "tech preview" of such an implementation.Summary
I don't have a crystal ball. If I knew all the answers, I would retire rich now. But here are three themes I see as growing in importance over time.
I do believe that protocols are becoming increasingly moving to XML as a canonical representation. This does not preclude alternative representations optimized for various usages, just states that those that succeed will likely have clear mappings to and from a standardized XML representation.
Constraining the verb set does tend to make the protocol more understandable, scalable and secure. Accordingly, I see a movement from RPC encodings ("I have something I want you to do for me") to document literal encodings ("I have something to tell you").
Location independence is more than simply something that DNS provides. If you pick an open protocol because it allows you a choice of providers, you can easily defeat the purpose of this pretty quickly once you hard code references to a specific provider in your code.
In terms of a tangible recommendation, when designing an interface I recommend starting from a description of the data that you wish to exchange. Represent it as simply as possible as name/value pairs, using nesting when structure is important. When appropriate, wrap the results in an envelope and annotate it with routing and/or security information as a header. It generally is also helpful to capture the description of these messages in both a human and machine readable terms in WSDL to aid in the development of receptors for your data.