PaceCaching - Atom Wiki

Abstract

Explicit caching information for Atom feeds, compatible with HTTP 1.1 caching.

Status

Rationale

HTTP caching has proven effective for high-traffic sites. Caches are much more effective when they have explicit information about the expected lifetime of the content they are caching. Without explicit lifetimes, caches will choose conservative (fairly short) lifetimes to avoid serving seriously stale data.

High-traffic sites using Atom will benefit from explicit lifetime information in Atom feeds, especialy when that information is transferred to HTTP headers. Information in Atom entries may also be beneficial for per-entry caching in the client.

Proposal

4.XX atom:expires

The "atom:expires" element is a Date Construct giving the date/time after which the response is considered stale. A stale cache entry may not normally be returned by a cache (either an HTTP cache or a user agent cache) unless it is first validated with the origin server (or with an intermediate cache that has a fresh copy of the entity). See section 13.2 of the HTTP 1.1 standard [RFC2616] for further discussion of the expiration model.

To mark a response as "never expires," use an atom:expires date approximately one year from the date/time the feed was last modified. Atom feeds SHOULD NOT use atom:expires dates more than one year in the future.

The presence of an atom:expires element with a date value of some time in the future on a response that otherwise would by default be non-cacheable indicates that the response is cacheable, unless indicated otherwise by a Cache-Control header field (section 14.9).

The date value in atom:expires SHOULD be used in the HTTP 1.1 Expires header field for responses [Section 14.21 of RFC2616].

An HTTP server for Atom feeds SHOULD use a mechanism, such as NTP [RFC1305], to synchronize its clock with a reliable external standard, as specified in HTTP 1.1 [RFC2661].

4.YY atom:max-age

The atom:max-age element is a positive integer giving the maximum age in seconds for a cached Atom feed. A cached Atom feed is stale if its current age is greater than the age value given (in seconds) at the time of a new request for that resource.

5.1 Cache Algorithms

Caching in Atom is based on HTTP caching. Atom implementations SHOULD support caching. Caching reduces HTTP traffic and server load.

Atom implementations with caching MUST use the Atom values in the appropriate HTTP header fields. Implementations MAY use HTTP entitity tags for cache validation.

Atom user agents with caches MUST use the HTTP user agent cache expiration model.

For details of the required caching and expiration model, see the HTTP 1.1 specification [RFC1616], especially sections 13, 14.9.3, and 14.21.

Impacts

Notes

HTTP 1.1 supports two kinds of cache lifetime information: max-age and expires. A max-age value (also called "time to live") is the maximum amount of time that a cache may keep a copy. "expires" is a specific date and time after which the cache should check for updates. Each has strengths and appropriate uses.

When to use an expires date

Publishers with defined publication schedules usually know when the next feed update will happen. For example, a press release will be published immedately after the stock markets close. Implementing this with max-age is tricky. As the update time approaches, the max-age must be decreased. At one hour out, max-age is 60 minutes, half-hour out, 30 minutes, and so on until 1 minute. Or, the max-age can be set to zero an hour before the release date, losing the benefit of caching for that period. Either approach requires changing the HTTP header and the Atom document when the other content has not changes. There should also be special handling for the last-modified date, because there has not been a significant change in the content.

Expires handles defined publication schedules easily. Feeds may be static files with a fixed expires value.

When to use max-age

max-age is effective for feeds without defined schedules. It may be set once for the feed, to a typical period between updates. For feeds without a defined schedule, setting a values for expires is a guessing game. If "one hour from now" is chosen as for expires at the time of the last update, then caches will not cache that feed after that date, even if the blog is abandoned and never changed. If a max-age value is used, the feed would still be cachable.

Compared to rate controls

Some syndication formats have rate controls, giving a suggested refresh or update rate. A rate control may be converted to a max-age period with no loss of information. For use in HTTP headers, that conversion would need to be made in every implementation. Directly specifying max-age is simpler and less error-prone.

Compared to calendar specs

Some syndication formats have a publishing calendar, specifying days of the week for publication, hours of the day, and so on. These are complex to implement and still inadequate for many publication schedules. Instead of representing these in a feed format, publication schedules should be defined in the system which generates the feeds. That system can generate an expires value from the schedule. This allows arbirarily complex schedules while keeping the feed format simple.

Effectiveness of HTTP caching

Without caching, a web server must respond to every GET request. With caching, it will only see one request per cache, and that will only happen at the expires date or at max-age. The cache reload traffic does not increase with the client traffic, only with the number of caches.

For general web traffic, caching is about 50% effective, that is, roughly 50% of requests are satisfied from caches. Nearly all of that benefit comes from high-traffic pages. HTTP cache infrastructures already exist at nearly all ISPs. Atom servers may lease ISP-provided caching services at reasonable rates.

CategoryProposals