UserPreferences

PaceSliceAndDice2


Abstract

Large Atom collections need to be accessible in small chunks. This proposal gives servers a way to slice the collection into subcollections, and it gives clients a way to dice those slices for their own needs. This method efficiently supports clients with or without persistent state.

Status

Draft

Rationale

See "Notes" at the bottom.

Proposal

Add the following sections as children of "The Atom Publising Protocol Model," and "Functional Specification":

2.1 Atom Collections

An Atom collection is a set of items all of the same type ("members" of the collection), where the "type" may be, for example: Atom entry, category, template, "simple resource", or any other classification of web resource. A collection MAY also contain "subcollections."

Each collection has a URI which is given in the introspection file. A GET on the collection URI MUST produce a collection document as defined in "3.X.1 Collection Document." That document describes PART OF the state of the collection.

If a collection collects items that have an "updated" property, the collection is considered to be ordered by this property.

3.X Collections

3.X.1 Collection Document

A collection document is rooted by a <collection> element. A collection element may have any number of <member> elements as children; each such element identifies a member of the collection. In some situations, a collection document MAY NOT contain every member of the collection itself.

With respect to a given request, a collection document can be either "complete" or "partial." A complete document includes every item that falls within the request; a partial document does not. A server MAY return either type of collection document in response to ANY collection request. Rules for what must be included in a partial collection document are given in section 3.X.3. Note that a collection document is complete iff it contains all the items that match all the request parameters (e.g. "Atom-Time-Range" and "Depth"). To be complete, the collection document need not contain all the items in the entire collection.

Whether complete or partial, the members in a collection document MUST constitute a consecutive sequence of the collection's members, ordered by their "updated" properties. Collection documents returned for a collection that has no "updated" property MUST be "complete."

3.X.2. Elements in a Collection Document

A collection document MAY contain zero or more <member> elements and zero or more <sub> elements. Since a collection may be complete or partial, the <collection> element MAY include a "completeness" attribute, whose value is either "complete" or "partial". If the "completeness" attribute is not present, its value is assumed to be "complete".

A <sub> element in a <collection> document identifies a subcollection of the given collection. The <sub> element MUST have an "href" attribute, whose value is a URL where the subcollection can be retrieved.

A subcollection identified by the href attribute of a <sub> element is precisely a collection, as defined in sections 2.1 and 3.X.

Each <member> element MUST include an "href" attribute identifying a URL of the member resource. The "href" URL of a member resource is an "EditURI" under the terms of section 2, and MUST respond to the same HTTP methods as such an EditURI.

Each <member> element MAY include an "hrefreadonly" attribute. This optional attribute identifies a URL which, on a GET request, responds equivalently to how the "href" URL would respond to the same request. Clients SHOULD NOT apply to this URL any HTTP methods that would be expected to modify the state of the resource (e.g. PUT, POST or DELETE). A PUT or POST request to this URL MAY NOT affect the underlying resource. If the "hrefreadonly" attribute is not given, its value defaults to the "href" value. If the "hrefreadonly" attribute is present, and its value is an empty string, then there is no URL that can be treated in the way such a value would be treated.

Clients SHOULD use the "href" value to manipulate the resource within the context of the APP itself. Clients SHOULD prefer the "hrefreadonly" value in any other context. For example, if the resource is an image, a client may replace the image data using a PUT on the "href" value, and may even display a preview of the image by fetching the "href" URI. But when creating a public, read-only reference to the same image resource, the client should use the "hrefreadonly" value. If the "hrefreadonly" value is an empty string, the client SHOULD NOT make public reference to the "href" value.

Each <member> element SHOULD include a "title" attribute, whose value is a human-readable name or description for the item. "title" values are not required to be unique.

3.X.3. Collection Requests

Any GET request on a collection implicitly or explicitly identifies a time interval, within which any members in the response MUST fall. Furthermore, a collection GET refers either strictly to the members of the collection itself, or to the members of the collection along with the members of any descendant subcollections. These mechanisms, together with the choice of Request-URI, allow the client to refine the result set.

Since the response document may be either complete or partial, the client should examine the "completeness" attribute of the document to determine whether all of the requested members are included in the result. If not, the client may wish to make further requests to get more of the desired members.

3.X.3.1. Atom-Time-Range: Header

If a Atom-Time-Range: header is present in the request, its value explictly identifies the interval. If no Atom-Time-Range: HTTP header is given in the request, the interval is taken to be all of time.

The value of the Atom-Time-Range: header should be a pair of ISO 8601 dates, separated by a slash character; either date may be optionally omitted, in which case the range is understood as stretching to infinity on that end.

    atom-ranges-specifier = updated-ranges-specifier
    updated-ranges-specifier = updated-unit "=" updated-range
    updated-unit = "updated"
    updated-range = [iso-date] "/" [iso-date]

The response to a collection request MUST be a collection document, all of whose <member> elements fall within the requested range. The resulting document MAY be either complete or partial.

If the response is a partial collection document, it MUST obey the "initial subsequence" restriction: there must be no member falling in the requested range whose "updated" value is less than that of some "updated" value in the response document, and which is not actually included in the document.

If any members of the collection fall within the requested range, the server MUST return a collection document containing at least one <member> element. If no members fall in this range, the server MUST respond with a collection document containing no <member> elements.

The response to a time-range request MUST include a <sub> element for each subcollection of the requested collection that contains items falling within the requested range. The response MUST NOT include a <sub> element for any subcollection that does not contain members falling within the requested range.

3.X.3.2. Depth: Header

If the request includes a "Depth:" HTTP header, its value determines whether the response should strictly include members of the requested collection, or if it should also include members of every descendant subcollection. If the value of this header is "1", only members of the requested collection itself should be returned. If the value of this header is "infinity", then any matching members of any descendant subcollection should be included. Even if a given member is reachable from the requested collection by more than one path, it MUST be included only once in the returned document.

Only two values for "Depth:" are defined in this specification.

    depth-specifier = "1" | "infinity"

If a Depth header is not included, the response SHOULD be as if its value were "1".

Notes

This proposal is designed to satisfy two common use cases: synchronization and just-in-time fetching.

In the synchronization case, a client is holding a persistent record of the server's state. Through a series of transactions, the client wants to bring its state in sync with the server's state. It is therefore interested in all objects modified since the last time it performed a synchronization. By using the "Depth: infinity" header along with an appropriate time interval, it can request just those items. Note that the server need not return all such items in one response; but the "initial subsequence" requirement guarantees that a series of requests, with increasing minimums, will allow the client to fetch all the updated items without fetching any others or missing anything.

In the just-in-time case, a client has no persistent state, and it wants to present the user with a navigable, server-determined hierarchy, much as a mail client might present IMAP folders. The user will "open up" some of the (sub)collections, and before drawing the display, the client will fetch just the items in the open collections. In this case, the client doesn't want to transfer every item every time it updates the display; the use of collections allows it to fetch just one meaningful subset of items each time. This client would ignore the "Atom-Time-Range" accomodation and make a series of "Depth: 1" requests to create a display of hierarchical "folders"; as the user navigates into each folder, the client can efficiently fetch the items to display within that single folder.

Even though these two distinct use cases may accomodate most implementations, the two features compose nicely. A client that wants to sync just the items within a given folder can use a "Atom-Time-Range" header along with a "Depth: 1" header on that subcollection's URL.

One more advantage is that the containment model is not incompatible with the model of WebDAV, allowing special implementations to make extensions in that plane.


CategoryProposals