It’s just data


I spent four of the last six work days in all day meetings.  While the meetings were about other things (primarily GlueCode and Zend related, in case you were wondering), I saw several indications the basic fundamentals of REST are were not completely internalized.  Meanwhile, Don Box asked a very much related question.

The starting point is usually that somebody has an API that is intended to shield the developer from the inner workings of SOAP and perhaps another protocol or three.  The person is thinking about adding REST support (generally in the form of removing the requirement for a SOAP envelope and adding support for additional HTTP methods).  What can go wrong?

In a word, plenty.  But to fully explain why, it is helpful to start at the beginning.

Stock Quote

In Web Services, everybody seems to start with a stock quote example.  In so many ways, this is very, very wrong.  But instead of investing time in saying why this is, and thereby distracting from the point I am really trying to make, I’m going to go with it for a while, point out a few problems, then abandon it.  Think of the stock quote example the solid rocket boosters for this essay.

OK, so without asking why, we start from the premise that you want a reliable source for stock quotes.  This requires somebody to publish stock quotes, for you to find the service, and then to bind to it.  It turns out that that is the hard part, not the SOAP machinery.

How do you go about finding a stock quote?  Let’s try Google.  If you feel lucky, you pick the first one, and sure enough in seconds you have a stock quote.

That’s OK, but suppose you want to automate getting that data into a program.  You view source, see that this data is enclosed in a <big> tag, and in a few minutes, you have a working program.

This highlights a few problems with the stock quote example.  People who use this as the basis for a Web Service example end up advocating the use of SOAP to solve a not particularly difficult problem.  More importantly, this ignores the more difficult problem, which is the pesky terms of service that tend to accompany such services.

In any case, the important point here is that the web is a valuable source of data.  Particularly from a SOA perspective.

Web as a Data Source

Now, lets look at this from the other end.  You are publishing data out on the web.  It might not even be stock quotes.  And for some inexplicable reason, somebody out there finds it useful.  And even better, they can readily make use of it without requiring you to do any additional work.

You might even want to help.  You may even package up your data into one of the popular feed formats, complete with metadata.

Now what?  Well, I suppose they want updates.  What do they do?  Since what they have been doing is the only thing they can do without involving effort on your part, they do the only thing they can do.

They poll.

They fetch your page, once an hour, every hour, whether you updated it or not.

Soon, others do likewise.  One quickly becomes a dozen which becomes a hundred, then multiple tens of thousands.

Pelting your server.

Which eventually melts into a pool of molten silicon.


Ultimately, the solution may very well be to work over other protocols, and there are people working on enabling exactly that.

But meanwhile, let’s see how much mileage we can get out of HTTP.

In many cases, a more than adequate solution is to simply not return any data at all if nothing has changed.  One way to achieve that is for the server to track exactly what has been distributed to each client — something that is both a significant burden and is very impractical.

Another solution is for clients to return some additional information on every subsequent request.  This turns out to be very easy, but it does require servers to pass along the additional information in the first place, and for clients to retain and regurgitate this information on repeated requests.

This gets to the heart of the question I posed at the top of this essay: what does one do if you have an existing API which does not set up the expectation that the users of this API need to participate in this part of the process?

And it gets worse.  Sending the full page only if it changed only works well if the page changes infrequently.  If there is a possibility of frequent small changes, then perhaps only sending the changes might be appropriate.  Up goes the coordination costs across the API.  Note that these approaches are application or data format specific.

Make it stop

At some point, enough is enough.  People are taking advantage of you, and you are within your right to refuse service.  How do you go about doing exactly that?

One way to do that is with a temporary redirect.  But lets go a step further.  You have a situation where you really want a given page to go away.  Luckily, the authors of the HTTP specification were prepared for this eventuality, and created a special status code for this.

That’s nice, but as I pointed out that doesn’t do any good if the client silently ignores your request.  On the plus side, that post got noticed, and now — over a year later — I only get between 108 and 146 requests per day for these two feeds.

And I’m just a weblogger.  What if I were an enterprise?

Bottom Line

None of these issues are hard.  None of these issues require much in terms of engineering to implement.  Instead, these are issues with how you architect an API — sometimes retroactively — in such a way to allow HTTP specific information to permeate through the barrier and how you set up expectations in such a way that existing users of your API know what is expected of them.

At some point, you need to question the wisdom of having an API which abstracts away that which is important.

“how do you set up expectations in such a way that existing users of your API know what is expected of them?”

Perhaps one angle might be working on the common HTTP client libraries so that they default to respecting HTTP and require additional lines of code to disable that behaviour.

Using a predictable PHP example, if getting a result is as easy as;

$html = file_get_contents('');
if ( preg_match('/<big>(.*)<\/big>/',$html, $matches) ) {
    echo "Stock price is: {$matches[1]}<br>";

How do you convince someone to go and work with something else like this [link] and implement things like caching, respect for status codes etc? Might be better if file_get_contents() was primed by default to cache documents, send the right headers etc. unless I specifically tell it not to.

It’s a similar story with most HTTP client APIs I’ve used, perhaps the only exception being Javascripts XMLHttpRequest or something more extreme like win32com.client.Dispatch('internetexplorer.application')

Posted by Harry Fuecks at

Sam Ruby: REST vs API

Paul Hammond : Sam Ruby: REST vs API - At some point, you need to question the wisdom of having an API which abstracts away that which is important....

Excerpt from HotLinks - Level 1 at

“That’s nice, but as I pointed out that doesn’t do any good if the client silently ignores your request. ”

410s are cacheable, so some percentage of ignorant clients should be thwarted (in theory), but it’s hard to know how well the caches are helping you out. Setting explicit expiration headers might help.

Posted by Robert Sayre at

Hacks work today!

Hacks... <?php$html = file_get_contents('');if ( preg_match('/<big>(.*)<\/big>/',$html, $matches) ) {    echo “Stock price is: {$matches[1]}
”;} today, but not tomorrow....

Excerpt from Juice at

Apologies for the crass commercialism, but I feel that this is the perfect opportunity to pimp the first in my new line of “Dive Into Schwag” t-shirts:

[link] “Which part of 410 Gone didn’t you understand?”

BTW, you should update your spell checker to include “schwag”, a term which apparently has multiple meanings, all of which make me happy.

Posted by Mark at

Web Architecture Roundup

Some notes on recent activity by the web architecture regulars......

Excerpt from at

Sam Ruby: REST vs API


Excerpt from at


Thanks for the comment. Of course i was not referring to you :) One main difference is we humans who use the browser to interact with the server know exactly what it means and know that they have to update bookmarks etc. But what do we do in the code is the question. Let’s put this another way, what would the guy doing Python/XHTML handle this? suppose he is writing a cronjob that runs that code?

— dims

Posted by Davanum Srinivas at

Dims, I responded on your blog.

Posted by Sam Ruby at

Thanks Sam. I had one more question. I think of greasemonkey same as an Axis handler...So whose fault is this? [link] [link]

Is the fault of the guys who responsible for the web site? Is it the fault of the guys who wrote greasemonkey? Is it the fault of the guys who wrote the script? (since we are questioning everything Is it a limitation of REST?) Note that this problem did not happen to a human who interacts with gmail via a web browser :)

— dims

Posted by Davanum Srinivas at

Dims, I think that is a different issue, though it is close to the one that I explore in my followup post.  The nearest Axis equivalent would be to have a jws file that inadvertently exposed an interface that you didn’t intend to, anybody who binds to the WSDL that service produces would be adversely affected whenever a parameter is added, removed, or rearranged.

The reference to a “House of Cards” in the first link is very appropriate.  I routinely rely on cell phones, Google, and MapQuest, though none of them provide me with any guarantees whatsoever.

On the other hand, one should be able to interpret a content-type of application/atom+xml in an HTTP response as an assertion that the response conforms to a given schema, and there is specific code in the Feed Validator to check this assertion.

Posted by Sam Ruby at

I kind of see (i think!) what you are saying...let me think more about it.


Posted by Davanum Srinivas at


Sam Ruby is talking architecture, Mark Pilgrim is selling t-shirts. Seems appropriate.......

Excerpt from Randy Holloway Unfiltered at

Links for Monday, July 25th, 2005

I’ve gotten into the habit of simply creating a blog entry full of the interesting stuff that I run into each day. Some is blogged for sharing, others so that I can find it later for further review. From here on out I am going to institutionalize...

Excerpt from Jeff Barr's Blog at



Excerpt from More News at

GOOG will get you to your quote in one less step! ;)

Posted by Simon at

Web Services Link Dump

Sam Ruby REST vs APIAt some point, you need to question the wisdom of having an API which abstracts away......

Excerpt from Jeremy Smith's blog at

Sam Ruby: REST vs API


Excerpt from at

REST vs API by ms_michel & 4 other(s) rest Copy | React (0) [link]...

Excerpt from Public marks from user ms_michel with tag rest at

Sam Ruby: REST vs API

Sam Ruby: REST vs API by Elessar & 4 other(s) rest api webservices Copy | React (0) [link]...

Excerpt from Public marks with tags webservices & rest at

I’m using RESt API for this blog and I have a few questions. I wonder if you can answer them.

Posted by Jack at

Add your comment