Pulling on a string

Mark Nottingham: In a nutshell, XOP is an alternate serialisation of XML that just happens to look like a MIME multipart/related package, with an XML document as the root part. That root part is very similar to the XML serialisation of the document, except that base64-encoded data is replaced by a reference to one of the MIME parts, which isn’t base64 encoded.

I guess I am supposed to take comfort in the assertions that the serializations look like and are very similar to XML and MIME respectively.  Hopefully, I'll find more rigor in the working draft.

Looking at the examples in the working draft, it appears that the crux of the issue is the use of a Content-Transfer-Encoding: binary header.  Unfortunately this header isn't defined — or even mentioned! — in any of the normative text of this working draft.

Or in any of the nine documents referenced by this draft.

However, in the ninth document, there are five documents referenced.  And in the fifth reference, is the explaination:

 encoding := "Content-Transfer-Encoding" ":" mechanism
 
 mechanism := "7bit" / "8bit" / "binary" /
              "quoted-printable" / "base64" /
              ietf-token / x-token
 
 ietf-token := <An extension token defined by a
                standards-track RFC and registered
                with IANA.>
 
 x-token := <The two characters "X-" or "x-" followed, with
             no intervening white space, by any token>

Checking IANA, it seems that no more have been officially registered, though realistically, there may be some experimental ones in popular use that need to be supported.  Figuring that others may have been down this road before, I check the Python libraries (as they happen to be handy and open source), and I find that they support the following:

Can you spot the one that is missing?  You guessed it, binary.

Exhausting.  After all of this, I am left with a few questions.

The more time I spend with specs, the more appreciation I get for Test-Driven Development.


Sam, please dig a little further before passing judgment. Protocols aren't layered that way. Does HTTP tell you which TCP options to use? Does SOAP tell you how to use persistent HTTP connections?

XOP (not XOM) relies upon MIME multipart/related as a packaging mechanism. Although the primary use case for XOP in Web services is across HTTP transports (which, by necessity, would be encoded with a binary content-transfer-encoding), it's conceivable that a message will need to transit a hop (e.g., SMTP) that isn't binary-friendly, and therefore will need MIME content-transfer encoding of (for example) base64. Rather than force such gateways to transcode these messages, it's prudent to allow them to decide.

Python does indeed support binary CTE, in the email library. The relevant code (in email.Message.Message.get_payload()) is:

  # Everything else, including encodings with 8bit or 7bit are returned
  # unchanged.
  return payload

Note that 'binary' is NOT an encoding as such, but an indication of the domain of the content; from RFC2045:

  The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
  mean that the identity (i.e. NO) encoding transformation has been
  performed.  As such, they serve simply as indicators of the domain of
  the body data, and provide useful information about the sort of
  encoding that might be needed for transmission in a given transport
  system.  The terms "7bit data", "8bit data", and "binary data" are
  all defined in Section 2.

It's important to understand that specifications that are too constraining, while improving short-term interoperability, limit too much long-term functionality, and therefore have little value as standards. That's the place of a profile, whether its explicit or in the market. For more on this, see my blog a while back [link]

Posted by Mark Nottingham at

And BTW, pretty much nobody does experimental CTEs; we briefly considered making XOP a CTE (or the corresponding mechanism in HTTP, content codings), but the bar for new CTEs is VERY high, and the mechanism is very specific to MIME.

Posted by Mark Nottingham at

Protocols aren't layered that way.

I think that gets to the root of my concern.  Presumptions of how things are layered seem to be endemic and problematic.  As an example, I don't see how XOP would fit into a SAX or Pull based parser model as XOP radically affects the order in which data becomes available.

I'm starting to make progress prototyping a XOP client in Python:

from email.MIMEImage import MIMEImage
from email.MIMEMultipart import MIMEMultipart
import urllib

def encode_binary(msg):
  msg['Content-Transfer-Encoding'] = 'binary'

resource='http://moinmoin.wikiwikiweb.de/wiki/classic/img/moin-info.png'
image=urllib.urlopen(resource).read()
msg=MIMEMultipart()
msg.attach(MIMEImage(image,'png',encode_binary))
print msg.as_string()

Hopefully shortly I will have example code on what an multi-part Atom POST would look like if the protocol were XOP based.

P.S.  The XOM => XOP typos have been fixed.  Thanks!

Posted by Sam Ruby at

Re SAX/pull pipeline, in a scenario that doesn't involve threading, I'd expect the XOP head-end to have the whole MIME body available, then create a SAX/pull pipeline with the XML parser parsing the MIME root, passing events to an XOP filter that replaces XOP elements with MIME parts, which then passes events on to "my" SAX/pull module.  As far as "my" module is concerned, the order the data becomes available is already resolved.

Posted by Ken MacLeod at


Implementation in Python here: [link]

Posted by Mark Nottingham at

Add your comment












Nav Bar