Abstract
This proposal is an exact copy of PaceDateSamRuby with the addition of a date that indicates when an entry changed, for the purposes of selecting "the most recent" version or instance of the same entry or indicating, if the date does not change between two reads of a feed, that the entry need not be "hashed" to check for changes that may result in an indicator to the user.
Status
Open.
Rationale
Use cases for atom:e:
-
1) When compared to a stored value of atom:e from a previous read, indicates (by being "later") whether any additional processing needs to be performed (such as overwriting the stored representation of the entry or further processing to determine whether the entry should be passed on or displayed).
2) If the same entry (same atom:id) is read from multiple sources they may possibly differ, the value of atom:e indicates which is the latest.
3) If the same entry (same atom:id) appears more than once in the same feed, the value of atom:e is one indicator that can be used to determine which to process (store, pass on, or display). Another strong indicator would be xml:lang.
atom:e serves the same purpose as the HTTP header Last-Modified. Best practice would be for publishing systems to ensure that the Last-Modified value of the entry on the server (any representation) is the value represented in atom:e.
What happens in each case if atom:e is not adopted:
-
1) Assuming the source is authoritive (the original source), each new read of an entry can be considered the "latest" and be overwritten. Impact: minimal, unnecessary processing may occur.
2) This behavior is undefined. Impact: In the event that an entry differs between two or more sources (based on the time they retrieved the entry), there is an equal or greater than 50% chance that silent data loss will occur.
3) This behavior is undefined, particularly it is also not defined as invalid in Atom format-01. Impact: Should this practice not be explicitly disallowed and a producer emits an entry more than once, there is an equal to or greater than 50% chance that silent data loss will occur. Empirically, the chances of silent data loss approaches certainty, as most publishers put "new" entries towards the beginning of feeds and consumers process entries in feed order so that later entries would overwrite earlier ones.
Original Rationale for PaceDateSamRuby: The following were primary factors considered in the production of this proposal:
-
Tim Bray:
-
What the Echo-that-was project should be about is picking the stuff that's already been proven to work and be interoperable, and writing it down in a clean, clear way,
-
As an aggregator author, the screwed up nature of identifiers is one of my biggest problems with RSS. In fact, that and the optionality of a date field.
-
<pubDate> is an optional sub-element of <item>. Its value is a date, indicating when the item will become available.
-
<pubDate> is an optional sub-element of <item>. Its value is a date, indicating when the item was published. If it's a date in the future, aggregators may choose to not display the item until that date.
-
Identifier: Date
Definition: A date associated with an event in the life cycle of the resource.
Comment: Typically, Date will be associated with the creation or availability of the resource.
Proposal
Sections 5.6 "atom:modified" Element, 5.7 "atom:issued" Element, and 5.8 "atom:created" Element would be replaced with the following sections:
-
5.XX "atom:d" Element
-
The "atom:d" element's content conveys a date associated with an event in the life cycle of the entry. Typically, atom:d will be associated with the creation or availability of the resource.
atom:entry elements MUST contain exactly one atom:d element. The content of this element MUST conform to the Date and Time format defined in RFC 3339.
Publishers MAY change the value of this element over time. Consumers MAY chose to sort based on this value. Consumers MAY chose not to display entries containing atom:d elements until the date specified.
-
The "atom:e" element's content conveys the date on which the resource was changed.
atom:entry elements MUST contain exactly one atom:e element. The content of this element MUST conform to the Date and Time format defined in RFC 3339.
Publishers MUST change the value of this element when a change occurs in the entry, however the value does not imply or indicate any significance of the change that has occurred. Publishers may change the value of this element for any reason, even if no corresponding change is represented elsewhere in the atom:entry. Publishers may provide more than one instance or version of the same atom:entry within the same atom:feed (ie. with equivalent atom:id values), consumers MAY choose to only record, process, or present the atom:entry with the latest atom:e value.
Example:
-
A weblog author creates and makes available a post containing a description of a particularly difficult problem that he is trying to solve. The post occurs on noon on a Tuesday:
<atom:d>2004-08-03T12:00:00-04:00</atom:d> <atom:e>2004-08-03T12:00:00-04:00</atom:e>The following Thursday, after receiving a number of suggestions, the author choses to update the weblog post, both to reflect the ultimate solution, and to stop the flow of suggestions. The atom:d element is updated to the new date in the hopes that consumers will sort this entry back to the top causing people who might have read the original entry to notice the update. Finally, the dcterms extesion module is used to capture the original availability date:
<atom:d>2004-08-11T15:27:35-04:00</atom:d> <atom:e>2004-08-11T15:27:35-04:00</atom:e> <dcterms:available>2004-08-03T12:00:00-04:00</dcterms:available>Later that afternoon, the author is thinking about updating the entry, but chooses not to. However, they chose to "save" the unchanged entry to return to their main menu. The publishing system updates the stored entry and changes the value of atom:e:
<atom:d>2004-08-11T15:27:35-04:00</atom:d> <atom:e>2004-08-11T17:42:25-04:00</atom:e> <dcterms:available>2004-08-03T12:00:00-04:00</dcterms:available>Note: dcterms defines created, valid, available, issued, modified, accepted, copyrighted, and submitted, but defines each in terms of W3CDTF. Both W3CDTF and RFC 3339 are profiles or subsets of ISO 8601, and in general RFC 3339 is a subset of W3CDTF. Therefore, producers which chose to follow the example above are not affected, but consumers will need to be aware that the dates they find in dcterms MAY be of a lesser precision (e.g., "2004" is a valid W3CDTF date). When elements from the dcterms extension module are present in an Atom feed they SHOULD be expressed using the smaller RFC 3339 profile.
Impacts
Publishers will need to select one primary date for every entry, instead of three. If there is a desire to continue to include the remaining dates in the entry, the dcterms module may be used.
Notes
The date defined in this proposal requires a timezone. That may need to be revisited.
RFC 3339 section 5.6 contains the following which we should consider limiting, for purposes of interoperability:
-
This date/time format may be used in some environments or contexts that distinguish between the upper- and lower-case letters 'A'-'Z' and 'a'-'z' (e.g. XML). Specifications that use this format in such environments MAY further limit the date/time syntax so that the letters 'T' and 'Z' used in the date/time syntax must always be upper case. Applications that generate this format SHOULD use upper case letters.
-
NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.