It’s just data

Atom Schemata

Tim Bray: Here is draft schemaware for snapshot 0.2 of PEAW

In the process, Tim generated a lot of feedback.

Meanwhile, a few days ago Sean Palmer initiated an ExtensibilityFramework that just so happens to address a number of Tim's issues.  In the process, a RelaxNG grammar was produced which covers not only the core elements, but also all extensions.

The purpose of this extensibility framework is to allow a uniform mechanism for handing text (with and without markup) and detecting URIs.  The latter would also help with things like xml:base processing.

Sean produced a Python/SAX parser that converts this into RDF N3.  Sjoerd Visscher produced an XSLT transform that converts the same input into RDF/XML.


Instead of reinventing XLink why not just use XLink? I don't see much point using the 'ref' convention when the xlink:href convention already exists and is a standard. Plus, the abiliity to automitically pull out all links in an Atom document would probably be very useful. In vocabularies as 'linky' as Atom I'd think a domain-independent linking scheme would be essential.

Posted by Bo at

Bo, this mostly goes to show that naming is hard.

'ref' is used in a context where possibly any of 'href', 'id', or 'uri' could have been used.

'uri' and 'id' are probably closer than href/ref, as it's being used more like "this identifies this resource" and "this is the id of the resource I'm talking about."

'ref' is more like XIOR's xior:xoid and xml:id, although  XIOR and xml:id make potentially an unnecessary distinction between an id and a reference to that id.  In both of those cases, though, they make it explicit that an id can be used only once, on the element where all the information about that id, whereas many references can be made to the same id.

'ref', on the other hand, can appear more than once, and information from many elements with a 'ref' form a union of that information.  Because it can appear more than once and because a distinction between resource id and reference is not a necessity, a single attribute was picked.

The discussion between 'ref' and 'uri' is in the IRC log.  It appears that the code got changed to 'ref' at one point and no one brought it back up.

Posted by Ken MacLeod at

Bo, one reason for not using XLink is to not introduce too many namespaces in the Atom core. Though I see the usage for XLinks, I think we should keep the core as clean and simple as possible, and rather implement an Atomified version of XLink (within the Atom namespace) than implementing XLink directly.

Having more than one namespace in the core isn't very user-friendly or beautiful. Also, having different linking-methods in the core (atom:link) and in the extension (xlink) is unappropriate.

Posted by Asbjørn Ulsberg at

In ExtensibilityFramework Sean appears to be to create a "lite" RDF/XML (which we certainly need).

However, somewhere along the way the distinction between URLs and URIs got completely lost, which I think is a mistake. See http://radio.weblogs.com/0106548/2003/08/09.html#a119 for my reasons.

Sean, please bring that distinction back. If  you want to keep ExtensibilityFramework simple (and I guess you do), make the URI @id rather than [id]. But don't use @ref for both.

Posted by Ziv Caspi at

Ziv, the distinction between URLs and URIs is not getting "lost", it was discovered that they are not so distinct after all.  The IETF and W3C have been working a long time on clearing up what it means to be a URI and/or a retrievable resource, and for the most part I think they're getting it right.

The biggest difference between a URI and what used to be called a URL is based on context.  In Atom, it doesn't matter how the URI is represented (element content, a 'ref' attribute, or an 'id' attribute), it matters where it's used.  <link> is meant to be retrieved.  <id> is not.  In the Atom 0.2 snapshot, there's nothing else that indicates that the element content is a URI or URL.  'ref' is the same way.

Posted by Ken MacLeod at

Ken, in light of this, please explain how a consumer of an ExntensibilityFramework resource can determine which refs are retrievable and which are not.

Posted by Ziv Caspi at

Ziv: the first part of a URI is a scheme.  This is the portion of the URI that preceeds the first colon.  Schemes like 'http' are retrievable.  Schemes like 'mailto' are not.

Which schemes your application will support retrieval on is up to you.  'ftp' might be a good idea.  'irc' may or may not.  'urn' definitely not.

Posted by Sam Ruby at

RE: Atom Schemata


Schemes like 'http' are retrievable. Schemes like 'mailto' are not.


This is untrue. URI are just identifiers. Notions like whether something is "retrievable" based on its URI scheme are quaint notions from the days of URLs.

Message from Dare Obasanjo at

Tim Berners-Lee: The Web works because, given an HTTP URI, one can in a large number of cases, get a representation of the document.

Note also the that definition of the href attribute of the A element in HTML 4.01 is in terms of URIs.

Dare, clearly this is an area of intense theoretical debate.  I merely would assert that Ziv and others are pretty safe to assume that a HTTP URI is likely to be retrievable.

I would like to amend my previous statement: interpreting a mailto URI by launching your prefered mail client with the To: field pre-filled in (and perhaps the subject) is certainly reasonable.

Posted by Sam Ruby at

Sam, I can't tell if you are stating or suggesting that URI schemes that can be used to retrieve a resource are the sole indicator of whether or not a URI used in some context should be retrievable. (I'm in the camp that says the URI scheme is not the indicator.)

Ziv, this markup in Atom:

  <link>http://example.org/blog/4321.html</link>
  <id>http://example.org/blog/4321</id>

and this markup in the EF style:

  <link ref="http://example.org/blog/4321.html"/>
  <id ref="http://example.org/blog/4321"/>

are logically equivalent.  In both cases, it is the documentation (and/or schema) that says that <link> should be retrievable and <id> only used as an identifier.  Going further, the value of the URIs could be the same (http://example.org/blog/4321.html), still serve both purposes (identifier and retrievable resource), and still be dependent on the context to indicate how it should be used.

In EF and RDF, the contexts are also clearly specified.  It still is the vocabulary that tells you whether a given URI is intended to be retrievable, but EF and RDF also use URIs as both the subject identifier of resource records and the value of properties of resource records (references to other resource records).  RDF uses rdf:about when talking about the subject URI and rdf:resource when talking about the object value's URI, but they could easily be one attribute because it is always clear by the syntax when one is identifying the subject or using a URI as an object value.

EF defines a more compact XML model than RDF/XML (which is one of the reasons people are looking at it).  In doing so, it doesn't have the luxury of using two different attributes depending on whether it is being used as an identifier or a reference, consider:

  <feed ref="http://example.com/feed">
  <generator ref="http://example.org/genwell"/>
  <name>GenWell</name>
  </generator>
  </feed>

Here we have a feed ('.../feed') that has a generator ('.../genwell') and a generator ('.../genwell') that has a name ("GenWell").  The 'genwell' URI is used as both the property value URI in the feed and the subject URI of the generator resource record.  Here is roughly equivalent RDF:

  <Feed rdf:about="http://example.com/feed">
  <generator rdf:resource="http://example.org/genwell"/>
  </Feed>
  <Generator rdf:about="http://example.org/genwell">
  <name>GenWell</name>
  </Generator>

Posted by Ken MacLeod at

comments on URIs and URLs

"[...]somewhere along the way the distinction between URLs and URIs got completely lost, which I think is a mistake." -- Ziv Caspi # "the distinction between URLs and URIs is not getting "lost", it was discovered that they are...

Excerpt from Ken MacLeod at

Sam, using the scheme to determine whether you have a URI or a URL not only breaks static typing (and, I assume, XSD), it also doesn't work. For example, people who post twice a day might want both posts to have the same URL, but they certainly shouldn't both have the same ID.

Ken, if I understand you correctly, you're saying that to learn whether the attribute in X/@ref is a URI or a URL I need to have some knowledge of X itself which is not part of the document itself (I'm careful not to say infoset, see? :-). This could be some external schema, or have the meaning of all Atom elements "burned" into the processing application itself.

As far as I can tell from the EF proto-spec, however, this doesn't agree with the purpose of EF; namely, to be able to make such distinctions in an extensible manner, without external "help".

Posted by Ziv Caspi at

Ziv, correct: you must have knowledge of X, which is no different than in the Atom 0.2 snapshot (no relation to EF or RDF), nor in RDF, nor in several other specs.

Are you suggesting there should be a flag or attribute in formats indicating whether a URI is retrievable or not?  If I read URI != URL correctly, you are suggesting that, so I'll need to followup there to say why I don't see that as either necessary or applicable here.

Re. EF, I don't see anything in the purpose of EF that needs to know whether a URI is retrievable or not.  xml:base, for example, can be used with both identifiers and retrievable resources without knowing which is which.

Posted by Ken MacLeod at

From an xsd perspective, the data type is anyURI.

From a typing perspective URL is a subtype of URI.

Posted by Sam Ruby at

Considering the discussion we've had on this thread, I think 'uri' is now the better name for the attribute in ExtensibilityFramework.

Posted by Ken MacLeod at

Ken, I'm suggesting that we have clear differentiation between URIs (such as IDs) and URLs (such as links to various resources). This distinction is important for processors.

Sam, the fact that URL is a subtype of URI means that if you mark all your URLs as URIs, you'll lose functionality. Conceptually, it means that a processor cannot tell if something is retrievable or not (and if it tries to retrieve the resource and fails, whether it is a temporary problem or not).

I'm aware that HTML (4.1, I believe) says that A/@href is URI. In my opinion this is in error. Indeed, browsers ignore this and mark with a hyperlink everything in the A element, even if the system would not be able to oblige when the user actually clicks the link.
The Web works, but we shouldn't take that to mean it's perfect.

Posted by Ziv Caspi at

Relevant notes include URIs, URLs, and URNs: Clarifications and Recommendations 1.0 (aka RFC3305), RFC2396: Uniform Resource Identifiers (URI): Generic Syntax, and RFC2616: Hypertext Transfer Protocol -- HTTP/1.1 (noting that the last two predate the findings of the joint W3C/IETF URI Planning Interest Group in the first document).

Posted by Ken MacLeod at

Ah, I missed this one that is a draft replacement for RFC2396, Uniform Resource Identifier (URI): Generic Syntax, linked from the UniformResourceIdentifier wiki page.

Of particular note for this discussion are section 1.2.2 and section 1.1.3.

Posted by Ken MacLeod at

Add your comment