Abstract
Require atom:id values to be compared character by character, based primarily on the language defined in
Namespaces in XML 1.1, with additional guidance inspired by
rfc 2396bis
Status
Open
Rationale
Ensure predictable comparisions of ids
Proposal
URI references identifying entries and feeds are compared when determining whether a entry or feed is the same as one seen before. [Definition: The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone.
A consequence of this is that URI references which are not identical in this sense may resolve to the same resource. Examples include URI references which differ only in case or %-escaping. Note that relative URIs are not allowed as ids. Replacement of XML character and entity references must be done before any comparison.
Examples:
The URI references below are all different for the purposes of identifying entries, since they differ in case:
The URI references below are also all different for the purposes of identifying entries:
As are these:
If the entity eacute has been defined to be é, the atom:id elements below all contain the same URI reference, http://example.org/rosé.
-
<atom:id>http://example.org/rosé</atom:id>
-
<atom:id>http://example.org/rosé</atom:id>
-
<atom:id>http://example.org/rosé</atom:id>
-
<atom:id>http://example.org/rosé</atom:id>
-
<atom:id>http://example.org/rosé</atom:id>
Because of the risk of confusion between URIs that would be equivalent if dereferenced, the following normalization rules are strongly encouraged when generating new ids:
-
Always provide the URI scheme in lowercase characters.
-
Always provide the host, if any, in lowercase characters.
-
Only perform percent-encoding where it is essential.
-
Always use uppercase A-through-F characters when percent-encoding.
-
Prevent dot-segments appearing in non-relative URI paths.
-
For schemes that define a default authority, use an empty authority if the default is desired.
-
For schemes that define an empty path to be equivalent to a path of "/", use "/".
-
For schemes that define a port, use an empty port if the default is desired
-
Empty fragment identifiers must be preserved
-
All portions of the URI must be utf-8 encoded NFC from Unicode strings
Impacts
Existing ids would not be affected, but some feed consuming software may need to be modified to ensure that canonicalization logic is NOT performed.
