UserPreferences

PaceMuId


Abstract

Capture the zeitgeist of atom:id

Status

Spoof

Rationale

Several people have expressed the opinion on atom-syntax that http URIs are the One True Way, on the basis that they are easy to compare and can also double as resource locators (URLs). Others have expressed equally strong views against http URIs, on the basis that they are difficult to compare and can also double as resource locators (URLs).

Both extremes suffer from the same fundamental flaw: the concept of a persistent identifier is a fallacy. Atom should embrace the impermanence of all things.

When intelligent people disagree on matters of importance, the correct answer is Mu. In the case of identifiers, the correct answer is MuID.

Proposal

Add section 3.5, and change sections 4.2.6 and 5.5 of the format specification to read:

3.5 MuID

A MuID is defined as a sequence of "M", "I", and "U" characters. To create a MuID, a publisher starts with an arbitrary Unicode string, such as their home page, the URL of the feed itself, or the name of their current cat. Convert the string to UTF-8 and normalize it to Unicode Normalized Form C. This gives you a sequence of bytes. Convert each byte to base 2 and write it out, most significant bit first. Replace each "0" digit with an "I" character; replace each "1" digit with a "U" character. To this string, prepend the constant "muid://M". This is the MuID.

This described process creates URIs in a "muid" scheme which are conformant to RFC 2396 (and RFC 2396bis). The port, abspath, query parameters, and fragment identifier are always empty and MUST be omitted. But if you didn't omit them, the world probably wouldn't end.

When MuIDs are compared, they MAY be compared on an exact character-by-character basis. In addition, you may apply the following rules, in any order, as many times as you like:

  1. If you possess a MuID whose last letter is "I", you can add on a "U" at the end. These MuIDs are considered equivalent: "muid://MUI", "muid://MUIU".

  2. Suppose you have Mx. Then you may transform it into Mxx. Example: "muid://MIU" is equivalent to "muid://MIUIU". "muid://MUUI" is equivalent to "muid://MUUIUUI".

  3. If "III" occurs anywhere within a MuID, you may replace the "III" with "U". Example: "muid://MUIIIU" is equivalent to "muid://MUUU".

  4. If "UU" occurs anywhere within a MuID, you can drop it. Example: "muid://MUUI" is equivalent to "muid://MI".

4.2.6 "atom:id" Element

The "atom:id" element's content conveys an impermanent identifier for the feed. Like all things, it will necessarily change over time. For example, when the feed is relocated, or when your cat dies. atom:head elements MAY contain an atom:id element, but MUST NOT contain more than one. The content of this element, when present, MUST be a MuID, as defined in section 3.5 of this document.

It is not a goal that atom:id be usable for retrieval of information. Why would you want to do that?

Historically, in syndication feeds, the detection of duplicates has been error-prone because of failure to assign identifiers which are globally unique and stable. This history of failure is expected to continue regardless of what we put in some spec, since the problem can not be solved. The MuID solution was struck as a compromise that combines the worst part of http URIs (their laughably complex comparison rules) and the worst part of every other URI scheme (their not starting with the magic letters "h", "t", "t", and "p").

5.5 "atom:id" Element

The "atom:id" element's content conveys an impermanent identifier for the entry. Like all things, it will necessarily change over time. atom:entry elements MUST contain exactly one atom:id element. The content of this element MUST be a MuID, as defined in section 3.5 of this document.

The discussion of uniqueness, impermanence, and comparison in atom:id within atom:head found in 4.6 above applies also to atom:id within atom:entry.

Impacts

For publishers: embraces the reality of impermanence.

For client developers: acknowledges that complicated heuristics will always be required regardless of what we write in some specification that no one will ever read. They may as well learn it sooner rather than later.

Notes

with apologies to Douglas Hofstadter

Discussion

[AntoneRoundy] This scheme could be improved by the use of meta-MuID's (which don't look dereferencable, but you can try) thusly: Once you've created a MuID, do a DNS query to look up the result. Next, take the response to the query, encoding it as a MuID, and do a DNS query on that. Continue to MuIDify and DNS query until a definitive response is returned--this is the final meta-MuID. Although this will result in an infinite number of queries, the time required to get a definitive response will be a relatively small infinity because over time, DNS servers will get faster and bandwidth will increase, thus decreasing the amount of time required by each recursion.


CategoryProposals