It’s just data


I spent four of the last six work days in all day meetings.  While the meetings were about other things (primarily GlueCode and Zend related, in case you were wondering), I saw several indications the basic fundamentals of REST are were not completely internalized.  Meanwhile, Don Box asked a very much related question.

The starting point is usually that somebody has an API that is intended to shield the developer from the inner workings of SOAP and perhaps another protocol or three.  The person is thinking about adding REST support (generally in the form of removing the requirement for a SOAP envelope and adding support for additional HTTP methods).  What can go wrong?

In a word, plenty.  But to fully explain why, it is helpful to start at the beginning.

Stock Quote

In Web Services, everybody seems to start with a stock quote example.  In so many ways, this is very, very wrong.  But instead of investing time in saying why this is, and thereby distracting from the point I am really trying to make, I’m going to go with it for a while, point out a few problems, then abandon it.  Think of the stock quote example the solid rocket boosters for this essay.

OK, so without asking why, we start from the premise that you want a reliable source for stock quotes.  This requires somebody to publish stock quotes, for you to find the service, and then to bind to it.  It turns out that that is the hard part, not the SOAP machinery.

How do you go about finding a stock quote?  Let’s try Google.  If you feel lucky, you pick the first one, and sure enough in seconds you have a stock quote.

That’s OK, but suppose you want to automate getting that data into a program.  You view source, see that this data is enclosed in a <big> tag, and in a few minutes, you have a working program.

This highlights a few problems with the stock quote example.  People who use this as the basis for a Web Service example end up advocating the use of SOAP to solve a not particularly difficult problem.  More importantly, this ignores the more difficult problem, which is the pesky terms of service that tend to accompany such services.

In any case, the important point here is that the web is a valuable source of data.  Particularly from a SOA perspective.

Web as a Data Source

Now, lets look at this from the other end.  You are publishing data out on the web.  It might not even be stock quotes.  And for some inexplicable reason, somebody out there finds it useful.  And even better, they can readily make use of it without requiring you to do any additional work.

You might even want to help.  You may even package up your data into one of the popular feed formats, complete with metadata.

Now what?  Well, I suppose they want updates.  What do they do?  Since what they have been doing is the only thing they can do without involving effort on your part, they do the only thing they can do.

They poll.

They fetch your page, once an hour, every hour, whether you updated it or not.

Soon, others do likewise.  One quickly becomes a dozen which becomes a hundred, then multiple tens of thousands.

Pelting your server.

Which eventually melts into a pool of molten silicon.


Ultimately, the solution may very well be to work over other protocols, and there are people working on enabling exactly that.

But meanwhile, let’s see how much mileage we can get out of HTTP.

In many cases, a more than adequate solution is to simply not return any data at all if nothing has changed.  One way to achieve that is for the server to track exactly what has been distributed to each client — something that is both a significant burden and is very impractical.

Another solution is for clients to return some additional information on every subsequent request.  This turns out to be very easy, but it does require servers to pass along the additional information in the first place, and for clients to retain and regurgitate this information on repeated requests.

This gets to the heart of the question I posed at the top of this essay: what does one do if you have an existing API which does not set up the expectation that the users of this API need to participate in this part of the process?

And it gets worse.  Sending the full page only if it changed only works well if the page changes infrequently.  If there is a possibility of frequent small changes, then perhaps only sending the changes might be appropriate.  Up goes the coordination costs across the API.  Note that these approaches are application or data format specific.

Make it stop

At some point, enough is enough.  People are taking advantage of you, and you are within your right to refuse service.  How do you go about doing exactly that?

One way to do that is with a temporary redirect.  But lets go a step further.  You have a situation where you really want a given page to go away.  Luckily, the authors of the HTTP specification were prepared for this eventuality, and created a special status code for this.

That’s nice, but as I pointed out that doesn’t do any good if the client silently ignores your request.  On the plus side, that post got noticed, and now — over a year later — I only get between 108 and 146 requests per day for these two feeds.

And I’m just a weblogger.  What if I were an enterprise?

Bottom Line

None of these issues are hard.  None of these issues require much in terms of engineering to implement.  Instead, these are issues with how you architect an API — sometimes retroactively — in such a way to allow HTTP specific information to permeate through the barrier and how you set up expectations in such a way that existing users of your API know what is expected of them.

At some point, you need to question the wisdom of having an API which abstracts away that which is important.