It’s just data

Shades of Grey

Mark Baker is another blogger I would read with greater frequency if he had an RSS feed.  Here he identifies himself as an extremist.  Mark, I have a question for you.

Perhaps the most cherished axiom of the REST architecture is that GET must not have side effects.  This is even codified in the standard.  Now a Google query is darn near the canonical example of a  HTTP GET, right?  To do such a query  via a POST would be, well, a Gaffe of the first order, no?  Of such a magnitude that the only appropriate course of action is to gather together a fellowship of like-minded Hobbits, Dwarves, Elves and men and go on a quest to educate the world.

Now it turns out that the Google API has a limit of a 1,000 queries per day.  This means that the 1,001st query will not have the same result as the 1,000th query.  The query itself has a (GASP!) side effect.  It is most decidedly not idempotent.  So what do you do?  Call back the Hobbits and recode to use POST, or do you make the pragmatic decision to slightly bend the rules a bit?

Inquiring minds want to know.

P.S.  Thanks go out to Nelson Minar for bringing this question to my attention.


<blockquote>Now a Google query is darn near the canonical example of a HTTP GET, right?</blockquote>

well, if it has side effects (and here, clearly, it does) then it is <em>not</em> a good example of GET. On the other hand, when I use google through my browser, it is a safe operation with no side effects (afaik) and thus GET is the correct verb.

I'm guessing here, but I imagine that Google put the 1000 query limit on because of the added overhead of SOAP processing and because they don't have the same horsepower dedicated to the SOAP API as they do to the web API. Remember back when Google had the GET-able xml page? That was a proper REST service which used GET as intended.

Posted by matt at

I agree with Matt. If it has side effects then it isn't supposed to be a GET.

The definition of a GET is an idempotent operation not a HTTP query with simple parameters in the URL even though that is how it is used on lots of parts of the web.

Of course, this all assumes you take all the REST stuff seriously which I don't. :)

Posted by Dare Obasanjo at

Matt, try doing a Google query using "wget", "curl" or the like. You will see that it is forbidden. Google limits programmatic access to their data, independent of the access method.

Posted by Sam Ruby at

Well phooey on Google then although I suppose they have their reasons. Maybe too many bloggers sending batch queries for every spelling of their own names ;-) Of course spoofing the right User-Agent might help.

The important thing is, use of GET in this case is in violation of the principles of REST.

Posted by matt at

Your definition of side effect makes little sense. A side effect can be thought of as a change to a resource. The 1001st GET should simply return an HTTP 204 - No Content or something similar. The fact that the 1001st GET is illegal has nothing to do with potential side effects AFAIK.

Posted by itdp at

If you don't like the definition of a GET being an idempotent operation then take it up with the W3C. You can start with the editors of this document ( http://www.w3.org/2001/tag/doc/get7 )

If after 1000 HTTP requests a User Agent (UA) can no longer retrieve the same resource then the requests are not idempotent. End of story.

Posted by Dare Obasanjo at

I didn't interpret it as the UA can no longer retrieve the resource after 1000 requests. That's why I suggested the use of HTTP 204 No Content which can be read as:

NoContent indicates that the request has been successfully processed and that the response is intentionally blank.

So the GET is --successful-- and the resource is still there, it's just nothing. An empty document. Logically, this might indicate to the client that they've overextended their welcome.

If you consider the response of the 1001st request an error, that is a request that might fail due to an authorization exception, I still don't see how the server refusing to respnd to a GET constitutes a side effect.

Posted by itdp at

itdp, I see what you are saying but note what the TimBL axioms document that Sam linked too says, "It is wrong to represent the user doing a GET as . . . doing any operation which effects the state of the Web or the state of the users relationship with the information provider or the server." Certainly doing the 100th get has an effect on the user's relationship with Google.

Hey Sam, can you make the <textarea> for comments bigger?

Posted by matt at

Upon reflection and rereading the W3C TAG document at http://www.w3.org/2001/tag/doc/get7#use-get I tend to agree with your conclusion.

A 204 seems like a good RESTful response.

Posted by Dare Obasanjo at

To Sam and Matt,
I'd pick 3 month old W3C Technical Architecture Group findings over TimBL's 6 year old personal opinions with regards to which should be used as the guidelines as to how Web Services should operate.

Posted by Dare Obasanjo at

I'm curious, Dare, how did you come to that conclusion w/r/t 204 and the TAG finding?

And I'm also curious how does the actual words of the standard (referenced in the original append) rank in your hierarchy of authoritativeness relative to both the TAG and TB-L?

Posted by Sam Ruby at

Aight Sam,
Here a description of the thought process that led to my agreeing with itdp.

The claims that HTTP GETs should be idempotent have never really held much truck with me because at the very least there is usually one side effect to a web server access; an entry in the server logs. Upon realizing that claims of HTTP GET idempotence are derived primarily from personal opinions of W3C members (which in my book don't count for much until they become official policy) or from standards other than HTTP (e.g. HTML) then the argument for HTTP GETs to be truly idempotent becomes much weakened.

The HTTP 1.1 text highlighted in the TAG document points out that it impossible to ensure that HTTP GETs do not cause side effects on the server and points out that there are times when this is even desirable. It also says that HTTP GET requests SHOULD not do anything beyond retrieval (notice that this isn't a MUST).

Further reading of the HTTP 1.1 RFC section on idempotent and safe methods at http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1 leads to the following entry

"However, it is possible that a sequence of several requests is non- idempotent, even if all of the methods executed in that sequence are idempotent."

Given the above points, I consider it acceptable for an HTTP GET to both retrieve a resource and increment a counter.

The question then became what should be returned once the counter went past 1000. None of the 4xx or 5xx error codes at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4 or http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5 seemed to satisfy this specific situation. Since it isn't really a client or server error (at least not any of the kinds of client error described in the RFC).

The description of 204 at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5 seemed to best fit this scenario. The server has processed the correctly formed client request successfully but there isn't anything to send back because the client has exceeded its request limit. So a blank document and perhaps some extra headers are returned.

That's what I thought.

As for which ranks the highest on authoritativeness between the HTTP RFC, TAG findings and TimBL's opinions? I'd say

HTTP RFC > TAG Findings > TimBL opinions

Posted by Dare Obasanjo at

It seems we need a definition of side effect. I don't know of an authoritative one. Some relevant TAG discussion is at http://lists.w3.org/Archives/Public/www-tag/2002Apr/0091.html in which Fielding says GET should not be used for "any non-retrieval action"

I don't think calling what Google is doing a "counter" is fair... What if I framed it as: I have an account at Google with Google currency in it (queries). Every time I run a query, the account is debited. I hope my bank would never charge me like that for online banking!

Posted by matt at

I don't understand what all the fuss is about. How is you-are-over-your-quota any different from page-is-not-available for any other reason?

403s or other errors do not violate idempotence.

Regarding charging pennies per request, I'd be upset if I was charged in the event of an exception condition (over my limit), but that should be specified clearly in the charging model.

Posted by Jason May at

Simple definition of a side effect:

A side effect constitutes a change to a **user-visible** resource which is outside the scope of the method semantics.

This essentially means that a successful GET should do nothing more than return an appropriate representation of a resource, a successful PUT should do nothing more than guarantee that an object exists at the given URL and is potentially GET'able, and a successful DELETE should do nothing more than guarantee that no resource is located at the given URL and therefore, (potentially!), GET'ing the URL immediately after the DELETE should return a 4xx code.

I'm not sure why, but Sam et. al. seem to think this means every GET should have an identical response to the previous GET but it's not true. A server can return any type of response it wants except response codes which indicate a side effect eg HTTP 201 Created.

What then happens if Google decides to expose the URL of the counter resource for each user? For example, users are allowed to GET /rest/remaining-requests?KeyID={KeyID}.

(I suspect this is the question you <em>really</em> wanted to ask.)

If this is the case, your original use of GET is an abuse of HTTP. If the user can detect their remaining requests then it must be explicitly clear to the user that performing searches constitutes more than just retrieving a resource. We've now stepped outside of the semantics of GET and we have an official side-effect because the user-visible counter resource is changing when a search happens.

The correct and REST-ful way to implement such functionality would be to have the user POST (or PUT, it depends) their search request and then Google responds with a HTTP 201 Created and assign a URL to the resulting search request in the Location header. The user can then GET this URL repeatedly without consequence until the search resource expires. (Alternatively, Google could allow users to GET the resulting search resource only once and then if they try to GET it again return an HTTP 410 Gone (or HTTP 404 Not Found) to indicate that the search results are gone forever but this would just be cruel since caching is cheap).

Assigning a URL to the counter really does change everything.

Matt: Obviously debiting money from a bank account should not be done with a GET. I don't see what you're getting at.

Sam: As somebody who's building and deploying restful webservices, I don't ascribe to any sort of GET supremacy. I use all the HTTP verbs including OPTIONS and TRACE quite extensively. I'm not sure what your intent was in posing this question to Mark. If you wanted to demonstrate that designing RESTful web-services requires careful thinking and design then you succeeded. If you wanted to demonstrate the superiority of Google's SOAP interface over a RESTful one, you haven't convinced me.

Dare: Changing the web server logs don't constitute as a side effect since the user shouldn't see server logs. This goes back to the idea of user visibility. And you're essentially right, section 9.1.1 makes it clear that while one request may be idempotent a sequence of requests will often not be idempotent but I would stress that GET can only increment a counter if the user isn't aware of the counter (eg a page hit counter or a counter in the server logs).

- itdp

Posted by itdp at

itdp, Dare, and others. Idempotent does not mean side effect free. HTTP DELETE is idempotent.

See http://www.intertwingly.net/blog/?entry=787 for more details.

Posted by Sam Ruby at

Hi itdp,
You state

"I'm not sure why, but Sam et. al. seem to think this means every GET should have an identical response to the previous GET but it's not true."

as if Sam and I are in the wrong which is weird because this is the VERY DEFINITION of the word idempotent.

The W3C TAG is right in stating that the the HTML REC (and TimBL) have been using a made up definition of the word idempotent which based on W3C TAG discussions and this thread no one is able to properly define because of the "side effects" sticking point.

It's kinda hard to have a discussion when everyone is using different definitions of the same words and even the authors of the original specs cannot come to consensus.

Posted by Dare Obasanjo at

Idempotency is not important here, what we are really talking about is method _safety_. GET is supposed to be safe as defined by the HTTP 1.1 spec: "GET and HEAD methods should never have the significance of taking an action other than retrieval. These methods should be considered 'safe.'"

The spec goes on to say that software cannot guarantee no side-effects so if there is a side-effect to a GET or HEAD, it is understood that "the user did not request the side-effects, so therefore cannot be held accountable for them." Dare - I believes this somewhat addresses your issue wrt logging.

Essentially there are side-effects that matter and side-effects that matter less. It is difficult (and perhaps inadvisable) to implement a 100% safe method in HTTP so we can only insulate the user by making certain the side-effects don't affect them (e.g. logging) It can be a blurry line sometimes. I _think_ that's what Sam was getting at to begin with.

itdp - I didn't mean we should make up our own definitions ;-P . . . what if I made a GET-able servlet that returned a picture of a flower and behind the scenes signed the user up for an email list - no resource with a URI was changed so by your definition this operation side-effect free (safe)!? Also, how do you see debiting a Google query account as different from debiting a checking account? Both are just a number until they're empty and you can't use them anymore.

Posted by matt at

Many website providers have a bandwidth limit on pages. If your website uses up too much bandwidth per-month, the page is shut down. Obviously, each request brings the site closer to the bandwidth limit. Does this mean sites using such providers should only be accessible via POST?

Posted by Aaron Swartz at

Interesting discussion. I think (as with many discussions) a lot comes down to the definition and understanding of the key words being used. Here's my understanding.

Idempotent doesn't mean 'no side-effects', it means doing the operation again has no (further) side-effects.

A side-effect is (in this context) a change in state of a resource, or something similar. It is not the result of the operation, nor is it necessarily visible to the user (that's why it's called a side-effect, and not an effect).

On the subject of results of queries, there are a couple of observations to make. Performing a (GET) query will always return the same result. The result is "what Google found right now for your query". It isn't always going to be the same _data_, but the _result_ is the same. There's no rule that says the _data_ must always be the same.

Nor does having a non 200 status code (e.g. the 403 from the example in my weblog post on this) undermine the idempotency of the Google GET query. A non-200 status code isn't a side-effect, either. Doing the 1001st call does not affect the 'relationship' between the user and Google. The relationship in this circumstance is "you can do 1000 calls and no more". Seems to me that a 403 would _underline_, rather than undermine, that relationship.

One thing I most certainly do agree with is that "this is not a simple black or white issue" (which is what Sam said, and was his real goal in bringing up these questions). I'm glad that we have opportunities like this to talk about these sorts of things.

Posted by dj at

I just posted a response to Sam's question.

Posted by Mark Baker at

(sorry for the delay, the first version of this article died in a spectular system crash)

Nelson Minar and Sam Ruby point out that Google is an interesting corner case in its adaption to the Web architecture. Because it uses an "innovative" business model for its new "API", it presents an interesting engineering challenge. Basically the problem is that there are a variety of benefits to exposing Google's data as URI-addressable, GET-fetchable data. Many people have heretofore agreed that this is the right thing.

The problem is that in a very subtle way, the Google API is not idempotent. Multiple calls to the service do not return the same thing. This is because every user is allowed only 1000 calls per day so each of them increments an implicit counter.



Orgthogonal to the issue of GET versus POST and Web architecture versus RPC, I'll point out that implicit service state is a bad idea. If Google is going to count to 1000 then that count should be exposed either in every message or as a resource that the user can query to know how close they are to the limit. Let's say I have ten different programs using the same user ID, it is extremely difficult for me to coordinate between them to count usages. It is much easier for Google to expose that information as it has to keep the information anyhow. "No implicit state" is a REST design principle but I think it makes just as much sense for RPC-based protocols.

On to the GET versus POST issue:

Obviously one solution would be for Google to lift the 1000 call limit and just allow each user to make any reasonable number of calls. I hope that the for-pay version of the Google API does or will work in this model anyhow. After all, I can hardly promote the Google API as a robust solution to my boss if at the end of the day it might stop working because we did a few extra queries. If the Google service is not literally unlimited, I would nevertheless have to buy enough "hits" to make it seem unlimited or else I couldn't build any important business services around it.

Or to put it another way: the Google service is hardly useful to me if I often run up against the usage limitation. Therefore if it is usually useful to most people then it must be the case that most people seldom run up against the limitation. If so, then the service usually presents the illusion of idempotency even if in practice it is not entirely idempotent. And in fact, this is the case for almost all services. As Aaron points out, many of us have bandwidth limitations from our ISPs. If we took an absolutist position on idempotency, nobody who is bandwidth limited could ever use GET.

Basically, as engineers, we need to decide whether the non-idempotency is theoretical or practical. i.e. do we expect to run into the limit often. If the answer is "yes" then Google needs to think carefully about whether it is providing as useful a service to its users as it could/should. If you are willing to pay for the service then it seems clear to me that Google should be able to sell you mission-criticality: i.e. the illusion of unlimited usage and idempotency.

If you aren't willing to give Google money, and they aren't willing to trust you to use the service reasonably, then it is just a question of which way the system is degraded: by providing a slightly broken GET or using POST which is theoretically better but has a bunch of practical problems. I think that using a slightly broken GET is better because it provides a better upgrade path to those who wish to build apps in free-loader mode and shift them to pay-mode later.

I'll present some other evidence that the service is treated as "unlimited" and idempotent by almost everybody. First, it took months for anyone to notice (at least publically) that it would be a non-idempotent GET, despite hundreds of hours of discussion of GET versus POST when I first wrote Google's Gaffe. Second, the fact that Google does not expose the count in the API indicates to me that they think that the count is not relevant to most of their users most of the time.

It isn't worth getting too worked up over this example. Services that put high limits on their usage are just being reasonable. To reasonable clients they will present an effective illusion of idempotency. Services that put low limits on their usage are substantially less useful because you cannot rely on them to be available when you need them. This means that such services will be very rare (as they have always been rare on the Internet).

There are good reasons not to mix information fetches with side-effects so it seldom happens. In fact, if it happened more often, the utility of Google itself would plummet because just spidering a site would cost someone "tokens". Thankfully, we can use most of the Web without going around handing out our user ID and jealously guarding our tokens. This is true even of XML-based Web services like Meerkat, the Open Directory Project and thousands of RSS feeds.

That isn't to say Google is necessarily wrong. They can run their business how they like. The point is that in industries where there is competition (i.e. industries other than the search engine industry) customers will not put up with arbitrary counter-based limitations and anyhow, enforcing them is probably more effort than it is worth.

One thing that this issue highlights is that Google would be better off using HTTP authentication (as I proposed in the article) rather than embedding keys in URIs. HTTP authentication has the virtue that the authority to get the data is separated from the identifier and method of getting the data. That makes it easy to build applications and documents where the authority is supplied by a third party. This is especially important where the referring "application" is declarative: for example) an XML document using XInclude. I should be able to share such a document without sharing my key.


Posted by Paul Prescod at

(I've had bad luck with hardware recently and lost the first version of this. Sorry for the delay.)

http://www.blogstream.com/pauls/1031155140

Nelson Minar and Sam Ruby point out that Google is an interesting corner case in its adaption to the Web architecture. Because it uses an "innovative" business model for its new "API", it presents an interesting engineering challenge. Basically the problem is that there are a variety of benefits to exposing Google's data as URI-addressable, GET-fetchable data. Most people have heretofore agreed that this is the right thing.

The problem is that in a very subtle way, the Google API is not idempotent. Multiple calls to the service do not return the same thing. This is because every user is allowed only 1000 calls per day so each of them increments an implicit counter.



Orgthogonal to the issue of GET versus POST and Web architecture versus RPC, I'll point out that implicit service state is a bad idea. If Google is going to count to 1000 then that count should be exposed either in every message or as a resource that the user can query to know how close they are to the limit. Let's say I have ten different programs using the same user ID, it is extremely difficult for me to coordinate between them to count usages. It is much easier for Google to expose that information as it has to keep the information anyhow. "No implicit state" is a REST design principle but I think it makes just as much sense for RPC-based protocols.

On to the GET versus POST issue:

Obviously one solution would be for Google to lift the 1000 call limit and just allow each user to make any reasonable number of calls. I hope that the for-pay version of the Google API does or will work in this model anyhow. After all, I can hardly promote the Google API as a robust solution to my boss if at the end of the day it might stop working because we did a few extra queries. If the Google service is not literally unlimited, I would nevertheless have to buy enough "hits" to make it seem unlimited or else I couldn't build any important business services around it.

Or to put it another way: the Google service is hardly useful to me if I often run up against the usage limitation. Therefore if it is usually useful to most people then it must be the case that most people seldom run up against the limitation. If so, then the service usually presents the illusion of idempotency even if in practice it is not entirely idempotent. And in fact, this is the case for almost all services. As Aaron points out, many of us have bandwidth limitations from our ISPs. If we took an absolutist position on idempotency, nobody who is bandwidth limited could ever use GET.

Basically, as engineers, we need to decide whether the non-idempotency is theoretical or practical. i.e. do we expect to run into the limit often. If the answer is "yes" then Google needs to think carefully about whether it is providing as useful a service to its users as it could/should. If you are willing to pay for the service then it seems clear to me that Google should be able to sell you mission-criticality: i.e. the illusion of unlimited usage and idempotency.

If you aren't willing to give Google money, and they aren't willing to trust you to use the service reasonably, then it is just a question of which way the system is degraded: by providing a slightly broken GET or using POST which is theoretically better but has a bunch of practical problems. I think that using a slightly broken GET is better because it provides a better upgrade path to those who wish to build apps in free-loader mode and shift them to pay-mode later.

I'll present some other evidence that the service is treated as "unlimited" and idempotent by almost everybody. First, it took months for anyone to notice (at least publically) that it would be a non-idempotent GET, despite hundreds of hours of discussion of GET versus POST when I first wrote Google's Gaffe. Second, the fact that Google does not expose the count in the API indicates to me that they think that the count is not relevant to most of their users most of the time.

It isn't worth getting too worked up over this example. Services that put high limits on their usage are just being reasonable. To reasonable clients they will present an effective illusion of idempotency. Services that put low limits on their usage are substantially less useful because you cannot rely on them to be available when you need them. This means that such services will be very rare (as they have always been rare on the Internet).

One thing that this issue highlights is that Google would be better off using HTTP authentication (as I proposed in the article) rather than embedding keys in URIs. HTTP authentication has the virtue that the authority to get the data is separated from the identifier and method of getting the data. That makes it easy to build applications and documents where the authority is supplied by a third party. This is especially important where the referring "application" is declarative: for example) an XML document using XInclude. I should be able to share such a document without sharing my key.




Posted by Paul Prescod

at

I hate to beat a dead horse and I doubt anyone is tracking this anymore but for future civilizations discovering this blog, Elliotte Rusty Harold just brought up a very similar topic on w3c-tag (technical architecture group). Here is Harold'soriginal post and the September archive sorted by thread for follow ups.

Posted by matt at

Idempotency: It's not just for APIs (or, the web is an API)

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to......

Excerpt from 0xDECAFBAD Blog at

I'm sorry, I can't kiss it and make it better.

Yes, lot’s of people are doing it, like Google for example. See: [link]...

Excerpt from I'm sorry, I can't kiss it and make it better. at

this google pre-fetch storm-in-a-teacup is PISSING ME OFF

jerakeen: this google pre-fetch storm-in-a-teacup is PISSING ME OFF yes, ok, the spec says ‘GET shoulnd’t change things’ In the ‘real world’, where I like to live, it does. google are therefore breaking things. So why are the vast majority of...

Excerpt from 2lmc spool at

GWA is the new net SUV

If the Google Web Accelerator breaks your web application, here are a few ways to protect them from this little sucker: From the GWA Webmaster FAQ: Can I specify which links Google Web Accelerator will prefetch on my pages? Yes, you can. For each...

Excerpt from padawan.info at

Planet_PHP: Midgard__prefetching_and_the_style_engine___Henri_Bergius

Edi ran into some issues with link prefetching and MidCOM administrative interface: The whole idea is to prefetch links. Well what if we have som ajax stuff there, and forms etc. It of course tries to prefetch them also. When this is done, they...

Excerpt from Gregarius at

REST and Web Services links

I’ve been doing some research into REST and web services. Here’s some links I found helpful. Paul Prescod: Common REST Mistakes (Date unknown) Sam Ruby: Shades of Grey (Sept 2002) Sam Ruby: Vacant Space (Sep 2004) Mark Pilgrim: RESTagra (Sep 2004)...

Excerpt from Just Looking at

harryf on Django Queue Service - when you want to schedule longer running background tasks

It uses a RESTful API, so you can use it from all kind of projects. Sorry but it’s not RESTful - GET is idempotent - from [here]([link]); > [...] the HTTP GET operation should have no side effects. For example,

...

Excerpt from reddit.com: what's new online at

London Geek Night

Thank you to everyone who came along to London Geek Night last night. There were many good questions and comments throughout the evening. I didn’t respond to all of them satisfactorily at the time, so I thought I’d expand on a few of them here....

Excerpt from iansrobinson.com at

Ian Robinson: London Geek Night

Updated The video of Thursday’s London Geek Night is now online . Thanks to Ikenna Okpala and Skills Matter for recording the event. Thank you to everyone who came along to the London Geek Night last night. There were many good questions and...

Excerpt from Planet TW at

Sam Ruby: Shades of Grey

GET that increments internal data. After 1000 Google requests get 204, great comments. "And you’re essentially right, section 9.1.1 makes it clear that while one request may be idempotent a sequence of requests will often not be idempotent...

Excerpt from Delicious/colin.jack/REST at

Add your comment