It’s just data

The H stands for Hyper

Everybody seems to be linking to Pete Lacey’s The S stands for Simple.  And for good reason.  In addition to being quite funny, I can honestly say — having lived through it myself — that it is quite accurate.  In fact, if one of the four flavors of feeds that Pete provides were Atom 1.0, I would gladly add his feed to Planet Intertwingly.  Perhaps he will find one of these pointers helpful.

But as Paul Harvey is wont to say, it is time for the Rest of the Story (no pun intended).  I share this because I believe that the only thing that those who simply poke fun at the alternatives without realistically describing the pitfalls of REST achieve is to convert blissful WS-* developers into despairing REST developers.

So with that introduction, I want to share a contemporary example involving REST, and the excellent scripting language named Python.

The Rest of the Story

It started out with a simple feature request for Planet: Anyway i can download the images in to cache itself ?  Given the nature of HTTP proxies, this is a common requirement.

The first problem is that, unlike SOAP, image data when transported natively over HTTP is not self contained.  Separate from the data itself is a number of HTTP headers, and often — but not always — this data is important too.  Because enough people miss this fact, there is a lot of content sniffing going on, and that causes problems too, but let’s not go there, let’s try to do the right thing and capture the headers too.

Now httplib2 does that, and optimizes requests based on cache control headers, and even stores the data into flat files by default; files that can almost be served asis.  Some small tweaks are required, like changing 304 status codes to 200, and there are some headers like transfer-encoding that only apply to the transfer that already happened (retrieving the image in the first place), and not necessarily to the one that is going to happen (namely, serving the image from the cache).

So, to test this out, I issued the following magic incantation:

sudo a2enmod asis
sudo /etc/init.d/apache2 force-reload

And then I created a .htaccess file with a single line: SetHandler send-as-is, symlinked my httplib2 cache into my public_html directory, and manually edited a cache entry using vim.

And it didn’t work.

The problem turned out to be that httplib2 makesmade use of a module named rfc822 to both read and write rfc 822 style headers.  And despite the fact that this portion of rfc 822 is relatively simple (don’t even get me started on the date format), the Python runtime library manages to get it wrong.  It gets it wrong in Python 2.2.  And in Python 2.3.  And in Python 2.4.  And in Python 2.5.

Instead of putting a CRLF between headers, it only puts a LF.  Even on Windows, and presumably even on MacOSX.  Of course, the same module is liberal on reading, so it has no problem consuming the invalid messages that it produces.  But not everything is quite so liberal in quite the same way, and somewhere between Apache and Firefox (I haven’t debugged it further), my first test didn’t work.

This turns out to be easy to fix, and here is my initial stab at the code:

divider=data.find('\r\n\r\n')
headers = rfc822.Message(StringIO(data[:divider+4]))
status = headers.get('status',None)
if status == '304': status='200'
for header in ['status','content-encoding','transfer-encoding']:
  if headers.has_key(header): del headers[header]
headers = str(headers).strip().replace('\n','\r\n')
if status: headers = 'status: %s\r\n%s' % (status, headers)
data = headers + data[divider:]

To be fair, embedded in those few lines is quite a bit of knowledge.  Not only of the workaround and status codes and headers changes that I mentioned above, but also a few other things.  Status isn’t really a HTTP header, but many tools (most notably CGI) find it convenient to pretend like it is one, and others have picked up on this convention.  Of course, it only works if this “header” is first, something that isn’t mentioned in the documentation, not even in small print.  It’s just something that “everybody knows”.

There’s also another subtle bug.  Not only does the rfc822 module get the line-endings wrong, it puts two blank lines between the headers and the body.  Effectively this means that the last blank line is considered a part of the body.  This is a problem for binary data.  It even is a problem with XML, if there is an XML prolog involved.

Let me repeat something for emphasis.  RFC822 is “simple”.  Simple enough that the Python runtime library gets it wrong.  And I haven’t even mentioned the various problems and deficiencies that urllib, urllib2, and httplib have that lead Joe Gregorio to conclude that it was time to create httplib2.

And if any of you noticed that rfc822 module is deprecated in favor of email, let me save you the trouble: the new email module has the same bug.

Conclusion

If you got this far, congratulations.  But if you have come to the conclusion that REST and WS-* are both equally bad, and the primary difference is that WS-* has a more comprehensive approach to tooling, then I failed to adequately convey my key point which I will now restate for emphasis: with REST, this turns out to be easy to fix.

In addition to all the architectural benefits of REST, as well as all the pragmatic experience the web has built up over time with caching and intermediaries— benefits and experience that WS-* forsakes — there is one other key difference.  HTTP wasn’t a home run all by itself, it was the pair of HTTP and HTML that were successful.  Key to this success is the fact that HTML is a file format that can be authored by a mere mortal in a text editor.  And yes, while I have seen HTML files produced by contemporary versions of Microsoft word (as well as the SVG files produced by Adobe Illustrator) none of this prevents me from doing something simple myself using only the tools that I have available.

By contrast, WSDL was clearly designed to be produced by tools and consumed by tools.

This difference is crucial.  In simple, pragmatic, operational terms, this difference enables me to always get my job done using only duct tape.

And, in this case, the difference is doubly important, as Joe has already started committing these changes/workarounds into httplib2.  Every indication is that with the next version of httplib2, those that try to serve the cache it produces and maintains asis, they will find that “it just works”.


I was saying to myself this morning, “if Sam would link to me, I’d have a REST hat trick (Mark Baker, Tim Bray, and Sam Ruby).”  But then I said to myself, “I betcha Sam would find a problem with my feed format, so maybe it’s best he doesn’t.”  Alas. :-)

For perspective, I’ve avoided making the details of RSS/Atom one of the things I care about, and just hoped that WordPress was doing the right thing.  Now, I have to spend tonight getting WordPress to generate a valid Atom feed, 'cause I’m not gonna lose the opportunity to be part of Planet Intertwingly.  My wife’s not going to be happy.

Pete

Posted by Pete Lacey at

Pete: Don’t spend too much time on it. Just about everything you need has already been done :-)

Posted by James Snell at

REST hat trick

Looking at my MeMemes, it appears that you have overachieved.  :-)

I betcha Sam would find a problem with my feed format

My reputation precedes me.

I have to spend tonight getting WordPress to generate a valid Atom feed

It isn’t that hard.  Take a look at those two links.  One is a simple plugin that produces a single feed that seems to work pretty much everywhere.  The other is a simple drop-in replacement that allows you to maintain multiple feed formats, the downside is that with every release of WordPress that you upgrade to, you will have to reapply this; at least until ticket #1526 is fixed.

Posted by Sam Ruby at

REST: it sucks the least. :)

Posted by Ryan Tomayko at

SOAP v. Rest

Because I can’t help but link to SOAP/REST discussions: The S stands for Simple The REST Dialogues A well reasoned...... [more]

Trackback from Jeremy Smith's blog

at

S for Simple

I feel guilty sometimes about the lull in my WS-Rants, because the forces of WS-Complexity and WS-Darkness are out there evangelizing tirelessly. But today I feel better, because there are powerful WS-dialogues out there speaking truth to...

Excerpt from ongoing at

[from wearehugh] Sam Ruby: The H stands for Hyper

[link]...

Excerpt from del.icio.us/network/pwkoolj at

Sam Ruby: The H stands for Hyper

wearehugh : Sam Ruby: The H stands for Hyper...

Excerpt from HotLinks - Level 1 at

[from sogrady] Sam Ruby: The H stands for Hyper

Sam’s extended take on some of the REST v WS-* differences - be sure to read this...

Excerpt from del.icio.us/network/annez at

The War On Error

Last March: REST wins, noone goes home. Well, it looks like we’re done. Which is worse, that everyone gets it now and we’ll have REST startups in Q207, or that it took half a decade? It’s tempting be scathing. But......

Excerpt from Bill de hÓra at

The S Stands For Slippery

SOAP is getting slippery, enjoy a good anti-SOAP laugh. Well, not only a good laugh it will probably cause some Deja Vu if you have tried to gone through the specifications. Duncan Cragg provides a fictional, but practical, dialogue in favor of...

Excerpt from iface thoughts at

REST is like SQL in my mind... or at least the verbs are, if more services and data where exposed restfully could we not create a RESTQL language to manipulate and work with it?

Posted by Alex James at

if one of the four flavors of feeds that Pete provides were Atom 1.0...

Feed Validator now green lights my sole Atom feed.  (Thanks James.)

Posted by Pete Lacey at

links for 2006-11-18

Pete Lacey’s Weblog :: The S stands for Simple everybody’s already linked to this, and there’s a good reason why - funny stuff (tags: humor SOAP WS-* WSDL web services) Amazon: Utility computing power broker - page 2 |...... [more]

Trackback from tecosystems

at

links for 2006-11-18

Pete Lacey’s Weblog :: The S stands for Simple everybody’s already linked to this, and there’s a good reason why - funny stuff (tags: humor SOAP WS-* WSDL web services) Amazon: Utility computing power broker - page 2 | CNET News.com v good piece by...

Excerpt from tecosystems at

Feed Validator now green lights my sole Atom feed.

Subscribed!

Posted by Sam Ruby at

Argumentum ad amminiculum: the WS-fallacy

Sam Ruby finds the advantage of REST over WS-* in (among other things) the fact that HTML can be authored by a mere mortal in a text editor, whereas WSDL etc. were clearly designed to be produced by tools and consumed by tools. I think this is a...

Excerpt from (format nil "Edward ~@R O'Connor" 1450) at

Sam Ruby follows up on The S stands for Simple....

Excerpt from Talideon.com Linklog at

Links - 11.20.2006

Clustering - EJBs vs JMS vs POJOs J2EE hasn’t made clustering that much easier. The H stands for Hyper Err, not exactly. I think the question of data formats is a lot less important than the vast experience with intermediaries, caches, and the...

Excerpt from discipline and punish at

Latest links

David Van Couvering: Zimbra uses Derby for offline storage! “inspired by the demonstration Francois did of offline Derby at ApacheCon” David Van Couvering: Why Use Java DB For Web Client Storage? Why use Derby vs. Firefox WHATWG or...

Excerpt from Blogging Roller at

Sam Ruby: The H stands for Hyper

Sam Ruby: The H stands for Hyper by znarf soap rest web services Copy | React (0) [link]...

Excerpt from Public marks with search ruby at

links for 2006-11-21

Sam Ruby: The H stands for Hyper “with REST, this turns out to be easy to fix”. I have to disagree that this is a quality of REST. One could easily build that capability into specs looking a whole lot like SOAP/WS-*, but you’d...

Excerpt from Web Things, by Mark Baker at

The day SOAP died

The S stands for Simple Why SOAP sucks The H stands for Hyper......

Excerpt from Znarf Infos at

Detecthing Not Modified Reliably

Yesterday, I more fully integrated Joe’s threading work into Venus.  From an end user’s perspective, one benefit of this is that the first time you specify spider_threads, you will see immediate benefit as the Last-Mo... [more]

Trackback from Sam Ruby

at

Finally, a REST Book!

I was very excited to stumble across a link to the book REST Web Services, by Leonard Richardson and Sam Ruby, which is slated to be released in May 2007. It can’t come soon enough for me. I’ve been building...... [more]

Trackback from arc90 blog

at

Sam Ruby: The H stands for Hyper

Sam Ruby: The H stands for Hyper by CharlesNepote & 1 other(s) soap rest http wsdl uddi Copy | React (0) [link]...

Excerpt from Public marks with search pim ruby at

No Silver Bullet Exists

Another bout of web services “religious war” has broken out again. We’ve been here before! This time it’s based on one funny and accurate diatribe about SOAP. The resulting frenzy in the blogosphere has yielded some quality comments, and even some...

Excerpt from Bryan's Blog at

Tim Bray: S for Simple

I feel guilty sometimes about the lull in my WS-Rants, because the forces of WS-Complexity and WS-Darkness are out there evangelizing tirelessly. But today I feel better, because there are powerful WS-dialogues out there speaking truth to...

Excerpt from ::SunTech:: at

Add your comment