UserPreferences

PaceCompoundEntries


Abstract

Adds atomicity when editing compound entries (entries with attached resources) using a Multipart/Related MIME structure.
Things applying to the entry (e.g. "draft" state or ACLs) also applies to the attached resources as well.

Status

New

Rationale

In many cases, people want resources to be part of another resource to which they are related.

With Multipart/Related, you can embed images inside an email whose HTML content references them (the images). You're not however forced to do that and can still reference remote resources, e.g. via HTTP.
With MHTML, Multipart/Related is extended to allow Web browsers to save an HTML page along with its related resources inside a single file. Internet Explorer calls these files "Web Archives" and uses an ".mht" filename suffix.
Multipart/Relative is also used to attach files to SOAP messages.
Within MS-Office files (this is also true in Open Office), you can insert images or other resources either by linking to the files (something like an hyperlink in an HTML file) or by fully embedding their data, so that the MS-Office file becomes standalone.
Within other documents, such as VRML and SVG, you can embed images using the "data" URL scheme.
At a "byte" level, there is also DIME.

In Atom, we want entry content (expressed as HTML or XHTML, or any other markup language such as Wiki or Textile) to be able to reference files that could be embedded as part of the entry, but can't appear inside the Atom Entry Document.

The current draft ([WWW]protocol-04) only allows editing of compound entries in multiple steps:

  1. upload related resources to a "generic" collection

  2. get back the "generic" collection Collection Document and find the "hrefreadonly" URI corresponding to each uploaded resource (as the URI returned in the Content-Location header is the "editing URI", which might be different from the public URI ("hrefreadonly"); if "hrefreadonly" is empty, there is problem!)

  3. use the retrieved URIs when composing the entry

  4. upload the entry to the "entries" collection

People want "draft" flag/state and ACLs to apply to both the entry and its attached resources. However, it must still be possible to use the above multiple-steps scenario, e.g. when a resource is to be shared by multiple entries (note that in this case, it's no longer a compound entry, just an entry referencing another Web resource).

As the content itself (within Text Constructs and atom:content) will most likely not be parsed and interpreted by the APP server, we can't enforce using the "data" URL scheme. The "data" URL scheme may also not be compatible with the content format (e.g. within plain text using a filter).

I therefore propose the use of Multipart/Relative, with some constraints aimed at making it easier to implement (e.g. mandate the use of "cid:" URLs instead of using Content-Location and adding a rule when choosing a Content-ID to allow text-level replacements from "cid:" URLs to the final resource URL).

Another solution would have been to define extension elements to the Atom Format to contain Base64-encoded resources. This means however reinventing part of Multipart/Related. It seems easier to me to use Multipart/Related, given that there are already implementations in many languages. For example, almost every Mail User Agent implements Multipart/Related.

Proposal

X.X Compound Entries

An entry may need to be transmitted together with attachments of various sorts, such as images or audio files. Such an entry is called a Compound Entry. Servers MAY support Compound Entries while clients MUST do so to ensure interoperability with servers.

Every resource in a Compound Entry inherit traits applying to the entry, such as being a "draft". When a client DELETEs a Compound Entry, the server SHOULD delete the aggregated resources as well.

A Compound Entry consists of an Atom Entry Document and one or more other resources packaged in a multipart/related MIME structure.

The rules for the construction of Compound Entries are as follows:

Implementations MUST resolve "cid:" URLs appearing in the Atom Entry Document. They also SHOULD resolve such URLs in text or XML body parts (i.e. parts whose Content-Type header value is an XML Media Type, begins with "text/" or ends with "+xml").
To resolve "cid:" URLs to the final public URIs of the aggregated resources, implementations MAY use text replacement inside the Atom Entry Document before parsing or processing it. Consequently, Content-ID values for subsidiary parts SHOULD be generated in a way such that any occurrence of the "cid:" URL for that subsidiary part ("cid:" followed by the Content-ID value, without the enclosing brackets) in the Atom Entry Document is an hypertext reference to the subsidiary part. [NOTE TO EDITORS: this is the same kind of rules that apply for the choice of the "MIME boundary", it must not be contained in any MIME body part.] As implementations, and particularly servers, MUST be able to generate the multipart MIME structure back from the stored files, replacement URLs of the attached files shall be choosen in a similar way, so that the process is reversible.
Extensions (as defined per Section 6.4 of the Atom Syndication Format) MAY also contain references to subsidiary body parts. Consequently, if implementations don't discard them, they MUST resolve "cid:" URLs in their content and attributes.

This specification does not define the role of subsidiary parts using a "multipart/" (nested multipart messages) or an "application/atom+xml" or "application/soap+xml" (aggregated entries) content-type. Servers MAY reject them with an appropriate error [ISSUE: which is the appropriate HTTP error code?].

The following additional rules apply when sending a Compound Entry over HTTP:

Example Request.

GET /member HTTP/1.1
Host: example.org
User-Agent: Agent/1.0
Accept: application/atom+xml, multipart/related;type=application/atom+xml

Example Response, with an extension element and an attached FOAF document referencing other parts.

HTTP/1.1 200 OK
Date: Tue, 07 Jun 2005 08:07:35 GMT
Last-Modified: Mon, 04 Oct 2004 18:31:45 GMT
ETag: "2b3f6-a4-5b572640"
Content-Length: nnnn
Content-Type: multipart/related; type=application/atom+xml;
        start="<070605.080735.entry@example.org>";
        boundary=MIME_boundary

--MIME_boundary
Content-Type: application/atom+xml; charset=UTF-8
Content-Transfert-Encoding: 8bit
Content-ID: <070605.080735.entry@example.org>

<?xml version="1.0" ?>
<atom:entry xmlns:atom="http://purl.org/atom/ns#">
...
  <ext:person xmlns:ext="http://example.net/atom/person#">
    <ext:name>My cat</ext:name>
    <ext:photo href="cid:070605.080735.photo0006.jpg@example.org" />
    <ext:foaf src="cid:070605.080735.cat.foaf@example.org" />
  </ext:person>
...
  <atom:content type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
      About my cat...<br/>
      <img src="cid:070605.080735.photo0006.jpg@example.org"
           align="right" width="256" height="256" />
      You never feed me.<br/>
      Perhaps I'll sleep on your face.<br/>
      That will show you.<br/>
    </div>
  </atom:content>
</atom:entry>

--MIME_boundary
Content-Type: image/jpeg
Content-Transfert-Encoding: binary
Content-ID: <070605.080735.photo0006.jpg@example.org>
Content-Disposition: inline; filename="photo0006.jpg"

...Raw JPEG image...
--MIME_boundary
Content-Type: application/rdf+xml; charset=UTF-8
Content-Transfert-Encoding: 8bit
Content-ID: <070605.080735.cat.foaf@example.org>
Content-Disposition: attachment; filename="cat.foaf"

<?xml version="1.0" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<foaf:Person xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <foaf:name>Kitty</foaf:name>
  <foaf:depiction
    rdf:resource="cid:070605.080735.photo0006.jpg@example.org" />
  <foaf:homepage
    rdf:resource="cid:070605.080735.entry@example.org" />
</foaf:Person>
</rdf:RDF>
--MIME_boundary--
(Note that the "Content-Type" header line has been continued across three lines so the example prints easily. Compound Entry senders should send headers on a single long line.)

Processing of Compound Entries (informative)

This is a simplified process, implementations may use another one (e.g. apply "cid:" search/replace only in some elements and attributes of the Atom Entry Document while processing it).

When receiving a Multipart/Related entity:

  1. first split the HTTP entity body into MIME parts using the boundary string provided in the "boundary" parameter of the "multipart/related" content-type;

  2. then find the root part, containing the Atom Entry Document;

  3. for each subsidiary body part:

    1. get its Content-ID and use it to generate the "cid:" URL

    2. determine its public URL (equivalent to "hrefreadonly" for "standalone" resources)

    3. replace every occurence of the "cid:" URL with the public URL in the root body part; this way, even unknown foreign markup can use and benefit from attached resources.

  4. parse and process the Atom Entry Document residing in the root body part of the multipart MIME structure

Impacts

If servers are not enforced to accepting Multipart/Related, clients are, or they won't be able to communicate with a server sending them a Multipart/Related compound entry.

Notes

The use of Multipart/Related could come as an extension to the Atom Publishing Protocol, but we would then need to provide compatibility with "strict APP" (how a client not supporting Multipart/Related would edit a compound entry? It could be allowed to edit only the Atom Entry, but what happen to resources that are no longer referenced by an entry?). It seems easier to integrate Multipart/Related into the APP protocol.

Text about "text or XML content type" must be reworded to account for media type parameters (maybe using the terms "type" and "subtype"?).

References


CategoryProposals