UserPreferences

PaceBatch


Abstract

Provide a way to batch multiple operations on a feed and its entries into one HTTP transaction. Individual operations are not necessarily a part of one ACID transaction - the results are equivalent to issuing individual operations one-by-one.

Author: Kyle Marvin / Mark Lucovsky / Arthur Zwiegincew

Status

Draft

Rationale

In some situations, applications need to perform a series of operations. Requiring them to submit separate HTTP requests for each operation slows everyone down.

For example, a web-based blog application may allow the user to mark a number of posts for deletion, and then submit the operation in one request. Another example would be an organizi off-line photo management application that needs to perform a number of POSTs, PUTs, and DELETEs when it comes online.

Consider a relatively extreme case. If you're trying to update 1000 entries, it's almost certainly going to be better in virtually all ways to send a single request that updates all the entries (or a small number of batch requests), rather than 1000 separate HTTP requests, regardless of whether you use a HTTP 1.1 persistent connection, or pipelining, etc.

When batching, there will be fewer bytes sent over the network, less server overhead for processing one request than processing 1000 separate HTTP requests, etc. For many use cases, Atom entries are likely to be relatively small, so the per-request overhead may be more significant (as compared to other content types).

In addition, many clients probably have limits on the number of requests that they are willing to pipeline at one time or the length of time a persistent connection can be held, potentially limiting the effectiveness of built-in HTTP 1.1 mechanisms.

Relying upon HTTP 1.1 mechanisms to optimize large numbers of small requests also assumes that HTTP will be the only wire protocol used to support APP. While certainly the primary use case, it may not be the only one.

Sample Use Cases

Google doesn't comment publicly about unreleased products, so the following should be interpreted as REPRESENTATIVE use cases Google thinks APP should be able to support (and that PaceBatch could help with), not SPECIFIC use cases that Google intends for APP:

a) APP should be able to support client tools to manipulate and manage APP-published data. When building such tools, often a logical operation a user might want to perform may result in many physical APP operations to achieve. One example is creating, renaming, or deleting a category, which would require a PUT on every single item that is currently (or to be) labeled with the category. In the domain of GMail, if I were using APP in the client and creating a label for the "PaceBatch and pipelining" thread, I might be PUTing categories on 60+ messages. Another example might be in moving Entry content between collections as a function of organizing it. Let's say I had three online photo galleries of family pictures (200 photos each) and decide to merge them into a single gallery. This would up require 1200 operations (600 POSTS + 3 * 200 DELETES) at the APP level to merge the Entries referring to the picture content.

b) APP should be able to support offline tools and manipulation of APP-published data locally while disconnected. This can compound the actions described in a), because now I might want to perform multiple logical operations (involving large numbers of operations) at the time I decided to synchronize my offline view of the content with the published online version.

c) APP should be capable of publishing data for syndication that might arrive at much higher rates than typical blog data. As an example, let's say APP was being used to publish snapshots of local weather updates from monitoring stations around the world. Local weather conditions might be published on some regular interval (say once a second). Data is constantly being collected, but it may not necessarily be updated at the rate of collection (due to connectivity issues or intentional design). If a monitoring station lost connectivity for 3 hours, it would have a backlog of 10800 POSTs to publish samples collected while offline.

It's also important to note that any given APP service might have large numbers of these use cases executing in parallel. There may be many on-line users editing content simultaneously (a), and/or many users synchronizing offline/online state (b), or many different content sources all submitting feed data simultaneously (c).

Proposal

Make the following changes to the protocol-04 draft:

Modify the diagram in 4.4.1 to:

Client                      Server
|                                |
|  1.) POST Member               |
|      to Collection URI         |
|------------------------------->|
|                                |
|  2.) 201, Created @ Location   |
|<-------------------------------|
|                                |

Add the following as (new) section 4.4.5:

4.4.5 Batch Update

Client                      Server
|                                |
|  1.) POST Batch Request Doc    |
|      to Entry Collection URI   |
|------------------------------->|
|                                |
|  2.) 200 OK, Batch Response Doc|
|<-------------------------------|
|                                |

1. The client sends a POST request to the Request URI containing a BatchRequest document to apply to the Entry Collection. 2. The server responds with a BatchResponse document describing the results of each operation.

Insert the following (new) section, immediately after Section 6.0 (Entry Collection):

6.1 Batch Documents

Batch operations can be performed on an Entry Collection. The Batch Request and Batch Response documents provide an envelope for a series of operations to be performed upon the Collection resource and for accessing the corresponding results.

6.1.1 Batch Request Document

A Batch Request document describes a set of operations that are to be applied to a Collection resource.

6.1.1.1 The 'app:batch' element

The app:batch element represents a collection of batch operations to be applied to a Collection resource.

atomBatchRequest =
   element atom:batch {
      attribute batchid { text }?
      (atomBatchOp*)
   }

atomBatchOp = atomPostBatchOp
 | atomPutBatchOp
 | atomDeleteBatchOp
6.1.1.2 Batch common elements and attributes

There is a common set of attributes and elements that all batch operation elements may contain:

appBatchOpCommonAttributes =
   attribute opid { text }

BE unique within any single Batch Request document.

There is a common set of elements for batch operations that target a specific resource:

appMemberBatchOpCommonAttributes =
   appBatchOpCommonAttributes,
   attribute uri { atomUri }

The 'app:header' element is used to provide information for an operation that would normally be provided by HTTP headers. For example, it could be used to set the 'Name:' header on a 'app:post' operation.

atomHeader =
   element app:header {
      attribute name { text }
      attribute value { text }
   }
6.1.1.3 The 'app:post' element

The 'app:post' element describes the creation of a single member within the collection. This is equivalent to performing a POST operation on the Collection URI.

appPostBatchOp =
   element app:post {
      appBatchOpCommonAttributes,
      (appHeader* &
       atomEntry)
   }
6.1.1.4 The 'app:put' element

The 'app:put' element describes the update of a single member within the collection. This is equivalent to performing a PUT operation on the resource identified by the uri attribute of this element.

appPutBatchOp =
   element app:put {
      appMemberBatchOpCommonAttributes,
      (appHeader* &
       atomEntry)
   }
6.1.1.5 The 'app:delete' element

The 'app:delete' element describes the deletion of a single member within the collection. This is equivalent to performing a DELETE operation on the resource identified by the uri attribute of this element.

atomDeleteBatchOp =
   element atom:put {
      atomMemberBatchOpCommonAttributes,
      empty
   }
6.1.2 Batch Response Document

A Batch Response document describes the results of executing a BatchRequest against a Collection resource.

6.1.2.1 The 'app:batchResponse' element

The app:batchResponse element represents a collection of batch operation results that have been applied to a Collection resource.

appBatchResponse =
   element app:batchResponse {
      attribute batchid { text }?
      (appBatchOpResponse*)
   }

appBatchOpResponse = appPostBatchOpResponse
 | appPutBatchOpResponse
 | appDeleteBatchOpResponse
6.1.2.2 Response common elements and attributes

There are a common set of attributes that all batch operation responses may provide:

appBatchOpResponseCommonAttributes =
   appBatchOpCommonAttributes,
   attribute status { text },
   attribute message { text }?

response code that would be returned if the operation was executed via a single request using POST, PUT, or DELETE.

The 'app:header' element is used to provide information for an operation that would normally be provided by HTTP headers in a response document.

6.1.2.3 The 'app:postResponse' element

The 'app:postResponse' element describes the result of an 'app:post' batch operation upon the collection.

appPostBatchOpResponse =
   element app:postResponse {
      appBatchOpResponseCommonAttributes,
      (appHeader* &
       appMember)
   }
6.1.2.4 The 'app:putResponse' element

The 'app:putResponse' element describes the result of an 'app:put' batch operation upon the collection.

appPutBatchOpResponse =
   element app:putResponse {
      appBatchOpResponseCommonAttributes,
      (appHeader*)
   }
6.1.2.5 The 'app:deleteResponse' element

The 'app:deleteResponse' element describes the result of an 'app:delete' batch operation upon the collection.

appDeleteBatchOpResponse =
   element app:deleteResponse {
      appBatchOpResponseCommonAttributes
      (appHeader*)
   }

Add new section 6.3 after current section 6.2 (Role of Atom Entry ...)

6.3 Batch Support

An Entry Collection Resource also accepts POST requests with an request body containing a BatchRequest document. The batch request document contains of any number of batch operations, which can be <app:post>, <app:put>, or <app:delete> elements. The 'app:post' operations create new entries in the Collection. The 'app:put' and 'app:delete' aperations specify the collection member URI to which they apply using attributes. The member MUST belong to the collection corresponding to the batch request URI. Each operation also contains an opid attribute. It is used to correlate operation responses to requests.

A batch response always returns 200 OK to indicate the batch request has been successfully processed, or will return an appropriate HTTP error code (ex 405 permission denied) if the request cannot be processed. A successful HTTP status code DOES NOT indicate that all requests within the batch were successful. The result of specific batch operations will be returned within the response body. The client can examine the elements of the returned BatchResponse document to determine the success/failure of any indivdual operation. The server will make a best effort to execute as many of the requested operations as possible (i.e. it will not abort after any individual operation fails). Batch operations can be performed in any order, and the server is free to reorder responses.

A single batch request MAY HAVE no more than one operation ('app:post', 'app:put', or 'app:delete') that targets any single member resource. This will simplify error handling / retry processing in the event that a response is not received due to a network failure or other conditions.

Examples

The following is a sample series of HTTP transactions that illustrate this proposal, applied to an Entry Collection:

GET /myFeed

200 OK

<?xml version='1.0'?>
<feed xmlns='http://purl.org/atom/ns#draft-ietf-atompub-format-09'>
  <title>Fubar</title>
  <updated>2005-05-23T16:25:00-08:00</updated>
  <author><name>Arthur Z.</name></author>
  <entry>
    <id>1</id>
    <link rel='edit' href='http://fubar.com/myFeed/1/'/>
    <updated>2005-05-23T16:25:00-08:00</updated>
    <title type='text'>Entry 1</title>
    <content type='text'>1.0</content>
  </entry>
  <entry>
    <id>2</id>
    <link rel='edit' href='http://fubar.com/myFeed/2/'/>
    <updated>2005-05-23T16:25:00-08:00</updated>
    <title type='text'>Entry 2</title>
    <content type='text'>2.0</content>
  </entry>
</feed>

----

POST /myFeed

<?xml version='1.0'?>
<batch batchid="1"
       xmlns='http://purl.org/atom/ns#draft-ietf-atompub-format-09'>
  <put opid='1' uri='http://fubar.com/myFeed/1/'>
    <entry>
      <id>1</id>
      <link rel='edit' href='http://fubar.com/myFeed/1/'/>
      <updated>2005-05-23T16:25:00-08:00</updated>
      <title type='text'>Entry 1</title>
      <content type='text'>1.1</content>
    </entry>
  </put>
  <post opid='2'>
    <app:header name="Name" value="="3" />
    <entry>
      <title type='text'>Entry 3</title>
      <content type='text'>3.0</title>
    </entry>
  </post>
  <delete opid='3' uri='http://fubar.com/myFeed/2/'/>
</batch>


200 OK

<?xml version='1.0'?>
<app:batchResponse batchid="1"
     xmlns='http://purl.org/atom/ns#draft-ietf-atompub-format-09'
     xmlns:app='http://purl.org/atom/ns#draft-ietf-atompub-protocol-xx'>
  <app:deleteResponse opid='3' status='204' message='No Content'/>
  <app:putResponse opid='1' status='200' message='OK' />
  <app:postResponse opid='2' status='200' message='OK'>
    <app:header name="Location" value="="http://fubar.com/myFeed/3/" />
    <member
      href="http://fubar.com/myFeed/3/"
      hrefreadonly="http://fubar.com/3/readonly"
      updated="2005-05-23T16:26:00-08:00</updated"
      title="Entry 3" />
  </app:postResponse>
</app:batchResponse>

----

GET /myFeed


200 OK

<?xml version='1.0'?>
<feed xmlns='http://purl.org/atom/ns#draft-ietf-atompub-format-09'>
  <title>Fubar</title>
  <updated>2005-05-23T16:25:00-08:00</updated>
  <author><name>Arthur Z.</name></author>
  <entry>
    <id>1</id>
    <link rel='edit' href='http://fubar.com/myFeed/1/'/>
    <updated>2005-05-23T16:26:00-08:00</updated>
    <title type='text'>Entry 1</title>
    <content type='text'>1.1</content>
  </entry>
  <entry>
    <id>3</id>
    <link rel='edit' href='http://fubar.com/myFeed/3/'/>
    <updated>2005-05-23T16:26:00-08:00</updated>
    <title type='text'>Entry 3</title>
    <content type='text'>3.0</title>
  </entry>
</feed>

Impacts

Notes

The original thinking was to have the idempotency requirement and error behavior for BatchRequest operations be similar to those of RFC 2616 pipelining (but pipelined within a single HTTP request). See [1] and [2]. The fact that POST operations cannot be considered idempotent limits the usefulness of HTTP 1.1 pipelining for APP, as it means that member creation requests cannot be pipelined.

It might be possible to support more than one failure mode for batch requests. In addition to "best effort" (the failure mode described above), a server could also support "fast fail", meaning it would abort the batch processing upon the first failure. If doing this, the loose ordering of operations clause described in 5.2.3 would need to be removed.

An alternative form might be to provide batching of operations of a common type, via the appropriate HTTP METHOD on a collection. In this model, a POST to a collection could be a batch request containing multiple entries to create, or a PUT would contain multiple entries to modify, or a DELETE could identify multiple entries to delete. This might result in up to 3 requests being used where one would work with the proposed model, but still provide a significant benefit over sending as discrete requests.

Limitations

This PACE only targets batch updates to Entry Collections. The basic technique could be generalized to other collection types, but this would require a more complex request structure (for example, using MIME multipart documents to POST other member content types).

References

[1] - http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.2

[2] - http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.2.2


CategoryProposals


CategoryCategory