It’s just data

HTTP Get Rocks

Abstractions leakTechnorati's API returns well formed XML unless any excerpt contains an ampersand, it which case it doesn't.  If this was buried under layers of XML-RPC or SOAP or content:encoding, this would be hard to find and even harder to deal with.

As it is, I can just cut and paste the URL into either Mozilla or IE, and I am told immediately what is wrong with the response.  Given this data, I can switch from Mark Pilgrims's pyTechnorati to Phillip Pearson's and be back up and running in minutes...

Don't let the experts fool you.  You won't find the real reason to use HTTP GET here.  The real reason why you want to use HTTP GET is found here.

This being said, when you blow past any reasonable length of a URL or want to do more than simple information retrieval, it makes sense to introduce back in XML and HTTP POST.  But my advice is to do so in this way instead of that way.  That way you are better prepared the next time abstractions leak.


"If this was buried under layers of XML RPC or SOAP or content:encoding, this would be hard to find and even harder to deal with"

FYI -- it wouldn't have happened with any of my implementations of XML-RPC or SOAP, because they encode ampersands for you.

BTW, there's generally a hyphen in the name of XML-RPC.

Posted by Dave Winer at

Hyphen has been fixed (thanks!).

I can show plenty of examples where abstractions have leaked with XML-RPC (Apple, Apache, Array, ...).  I could do the same with SOAP.  Heck, I can do the same with RSS 2.0.

In my experience, when you need structure, the closer you are to pure, clean, and simple XML, the better off you are.

View source is your friend.

Posted by Sam Ruby at

Wow, I hate to say it, but since the Technorati server is obviously not using a real XML library, we probably would have been better served by a simple non-XML-based plaintext data format.  It still would have required a custom data unmarshaller, but we wouldn't have had to deal with XML's crazy formatting rules.

Posted by Mark at

Re. plaintext: maybe, maybe not.  Even people who use RFC-822 header format (much less MIME header encoding) get hit by stray newlines if they don't encode them properly.

Posted by Ken MacLeod at

Sam your slide show is interesting, and yes of course I learned HTML by view source and please I don't want to talk about HTML today.

But the view source of XML-RPC is in your favorite scripting language. That's where you're supposed to look to make sense of it. So many people make the mistake of looking at what goes over the wire and miss the point completely. I know you didn't, but when they see it in their favorite language they eyes start shining, bright.

BTW, the same thing applies to OPML. People who look at the XML miss the point. Look at it in Omni or Radio or some other OPML-compatible outliner and it immediately makes sense. But most programmers aren't outliner people, so it's over their heads.

BTW, these are called opinions. Hopefully that's not a problem. I just noticed that the people here are the people who tend to have strong emotions about my opinions. Don Box admonished me for not posting here. So let's see if I can express an opinion or two without getting flamed. Thanks.

Posted by Dave Winer at

Dave, is XML-RPC just for scripting languages?  As you know, there is an Apache Java implementation.  If you simply view source of Java, you will see HashTables and will come to the conclusion that one can pass the full range of the Unicode character set, structs can have keys of any data type, etc., etc., etc..  And it will just work... Java to Java.  But it will break down when you try to interoperate with other implementations.

Having debugged my share of such interop issues (hash tables are a problem with SOAP too), I have come to the conclusion that what matters is the wire formats.  And RSS 2.0 is a prime example to me of how successful this can be.  Even though it too is rife with encoding issues (example: I have even seen Scripting News's RSS feed have encoding issues from time to time).  But life goes on, and the ultimate authority is the specs for XML and RSS and what goes across the wire.

That's my opinion of what works and what is most successful.

And Dave, your opinions on technical matters are welcome here.

Posted by Sam Ruby at

I've updated the REST wiki RestAndStructuredData  page with the flurry of recent tools that let one access XML using native syntax immediately following a GET (or preceding a POST, for that matter).

Posted by Ken MacLeod at

"This being said, when you blow past any reasonable length of a URL or want to do more than simple information retrieval, it makes sense to introduce back in XML and HTTP POST."

That may be true, but any application

Posted by Jeffrey at

"This being said, when you blow past any reasonable length of a URL or want to do more than simple information retrieval, it makes sense to introduce back in XML and HTTP POST."

That may be true, but for any application that required such a large block of data in a POST that would otherwise be performed using a GET, it would be a nice feature to return a 201 (Created) with the URL of a new resource in the Location header, a la tinyurl.com, instead of perhaps a 200 (OK).  This new resource would be more easily shareable, etc.

Posted by anonymous at

Sorry Sam, I tend to say scripting when I mean programming. Of course Java is an interesting way of writing XML-RPC apps. By view-source I don't mean view-bytecode, I mean source.

Posted by Dave Winer at

Ken: sneaky way to introduce RDF into this conversation.  I'm not biting.  ;-)

Jeffrey/anonymous: if the "resource" includes a private key (such as the technorati API does), I'm not sure that's appropriate.

Dave: I too was talking about program source.  Strings in Java are unicode.  Such strings arrive intact in an XML-RPC call when both sides are Java, but such applications may not interoperate with other XML-RPC implementations.  I'm not saying that's wrong or needs to be fixed, merely pointing out that that is an aspect of the XML-RPC protocol that "leaks" or "shows through" to the appplication.  I wrote an entire essay on this subject dealing with SOAP.

Posted by Sam Ruby at

Sam, believe it or not, I was referring to just the "Native XML APIs" updates, which goes to your comment (11:02) of what matters is the wire format.  Those APIs give access directly to the wire format in a way that only SOAP or XML-RPC encoding have done in the past, thus removing a layer of abstraction, or a very large part of it.

Posted by Ken MacLeod at

If only I had a pound for every time I had suffered this problem with XML - almost every system I have used that works with XML ends up suffering it at some point.

Does this just indicate that most languages don't have good enough tools for dealing with XML yet? That would explain why people insist on crafting it by hand and we continue to see broken XML.

Do the Python XML wrappers deal with this for you?

Posted by Simon Steele at

Hmm.  I create the feed for my blog (http://www.cincomsmalltalk.com/rssBlog/rssBlogView.xml) by using a SAX Driver.  VisualWorks Smalltalk has had good XML support at this level for years.  Some people just like to work harder, not smarter :)

Posted by James Robertson at

REST is hard to understand and implement

REST is hard to understand and implement. If it was easy, people like Tim Bray wouldn't write multiple essays about it and smart people like Dave Winer and Sam Ruby wouldn't be endlessly debating it. I personally think REST is a better philosophy...

Excerpt from Roland Tanglao's Weblog at

to ReST or not to...

Steven is triggering me again... can't say I relate to the conclusions he's making: there _are_ libraries out there to handle both HTTP and XML bindingbut I do agree that ReST seems to remain a grass-root movement, which quite naturally fosters a...

Excerpt from Marc, himself, his blogs, and you reading them at

REST is hard to understand and implement

REST is hard to understand and implement.... [more]

Trackback from Roland Tanglao's Weblog

at

Add your comment