ExtensibilityFramework

The ExtensibilityFramework, EF, is a proposal for a syntax model that is freely extensible but with a well defined extensions model, and does not use RDF/XML but is transformable to other formats. The EF has a work-in-progress formal grammar, and XSLT and Python/SAX implementations.

Sam Ruby: The purpose of this extensibility framework is to allow a uniform mechanism for handing text (with and without markup) and detecting URIs. The latter would also help with things like xml:base processing.

What would Atom documents look like using the EF?

The changes from the 0.2 snapshot are minimal (and listed below). Here's an example of what the syntax would look like:-

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://purl.org/atom/ns#">
  <title>John's Blog</title>
  <link ref="http://www.example.com/blog" />
  <modified>2003-08-17T18:30:00Z</modified>
  <author>
    <name>John Doe</name>
  </author>
  <generator ref="http://www.example.com/blogamatic.cgi">
    <name>John's Blogamatic</name>
  </generator>
  <entry ref="tag:www.example.com,2003:3.2397">
    <title>Atom Updates &amp; Ideas</title>
    <link ref="http://www.example.com/blog/2003/08/16/atom" />
    <issued>2003-08-17T08:30:31-05:00</issued>
    <modified>2003-08-17T18:30:00Z</modified>
    <content type="application/xhtml+xml" mode="xml">
      <div xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" id="blargh">
         <h1>Atom Updates &amp; Ideas</h1>
         <p>Whilst <a href="http://bob.blog/atom02">atom 0.2</a> is the 
           latest <em>snapshot</em>, we're still improving the syntax.</p>
      </div>
    </content>
  </entry>
</feed>

What changes were made?

Changes made to the Atom 0.2 snapshot syntax to use this ExtensibilityFramework:-

Move //id onto its parent element as an ref attribute. //id/. => ../*[@ref]
Change all elements with URI content into elements with @ref attributes.
Move @xml:lang into XML content whenever possible.

Why were these changes made? What value do they have?

They enable us to define a framework upon which extensions can be made. In other words, it will let module designers follow a set of rules to make consistent and validatable extensions. The framework is potentially reusable. It will also enable transformation to RDF etc. not just for the core language, but for extensions too--an impossibility with non-generic XSLT. SAX filters can be also be provided to convert the language into other forms.

The ExtensibilityFramework has re-opened the content debate, with a new model being proposed for string content: an optional mode (akin to MIME Content-Encoding), and an optional type (a MIME type).

Are there any implementations yet?

There's a Python/SAX parser that uses a recursive descent method, and Sjoerd Visscher's excellent transformation to RDF with example output that both follow the ExtensibilityFrameworkGrammar.

Historically, there's also an out-of-date transform.

Extensibility Framework Tutorial (work in progress; rough)

<feed>
   <title>STRING</title>
   <link ref="URI"/>
   <modified>STRING</modified>
</feed>

This Atom document consists of a single feed. A feed is an instance of what Syntax and other documents on this wiki call an entity, and what RFC 2396 and RDF call a resource. Each of the children elements, //feed/*, are properties of the feed entity/resource. Their values are given in capitals, and in order to provide a datatype (i.e. distinguish STRINGs from URIs) we say that element #PCDATA content is always a STRING, and @ref values are always URIs.

For those of you inclined to think in object oriented terms, this is how the above would be represented:-

feedentity.title    = "STRING"
feedentity.link     = <URI>
feedentity.modified = "STRING"

Now let's add an entry to the feed. An entry is also an instance of entity/resource. You can tell that from the document because, as you'll see, it has XML element children. Thus, the test for whether an element represents an entity/resource or not is whether or not it has any child elements.

<feed>
   <title>STRING</title>
   <link ref="URI"/>
   <modified>STRING</modified>
   <entry>
     <title>STRING</title>
     <link ref="URI"/>
     <issued>STRING</issued>
   </entry>
</feed>

In fact, the <entry> element relates the feed-entity to the entry entity. So we say that the feedentity has an entry property with a value of some entry-entity. In object oriented terms:-

feedentity.title    = "STRING"
feedentity.link     = <URI>
feedentity.modified = "STRING"
feedentity.entry    = entryentity
entryentity.title   = "STRING"
entryentity.link    = <URI>
entryentity.issued  = "STRING"

There are only two attributes that do anything special. One, @ref, has already been introduced: it says that its value is a URI. The second one is @mode. @mode can be used to say that some XML content is actually a string.

Consider, for example, someone defining an extension such as this:-

<feed>
   <title>STRING</title>
   <link ref="URI"/>
   <modified>STRING</modified>
   <extension>
     <p>STRING</p>
     <q>STRING</q>
   </extension>
</feed>

In OOP terms, we would get the following:-

feedentity.title     = "STRING"
feedentity.link      = <URI>
feedentity.modified  = "STRING"
feedentity.extension = extension
extension.p          = "STRING"
extension.q          = "STRING"

But what if instead of interpreting the <p> and <q> elements as properties, we want all of the content of <extension> to be treated as one big literal? All we do is add an @mode attribute to <extension> and set it to "xml".

<feed>
   <title>STRING</title>
   <link ref="URI"/>
   <modified>
   <extension mode="xml">
     <p>STRING</p>
     <q>STRING</q>
   </extension>
</feed>

That gives us the following:-

feedentity.title     = "STRING"
feedentity.link      = <URI>
feedentity.modified  = "STRING"
feedentity.extension = "<p>STRING</p><q>STRING</q>"

This is useful in Atom, for example on the <content> element, where we might have XML content that we don't want to interpret as properties.

Discussion

[AsbjornUlsberg] I just have to say; I love it. We really need a set of rules to define how an atom resourse can be described, and these rules will allow for extensibility. As long as you understand the simple rules, you will be able to parse not only today's Atom feeds, but also tomorrow's. I think this is a great purposal.

[Bo] Honestly, I think it's pretty silly. You have absolutely no control over what extensions/modules people will shove into Atom feeds so there's no point making with the pretense. I doubt aggregator authors will really be able to use this to consistently handle modules within Atom and since so many modules already exist which don't follow these conventions... Really, if you want a formal data model then use RDF. If you want to invent a whole new XML application with namespaced ef:mode, ef:ref and ef:id attributes then do so and reference it from the Atom specification. Either way, it's needless complexity.

semantics

[DannyAyers] I think it's great. Bo - it isn't about control. If you specify a system for defining extensions, then feed producers and consumers can use this rather than making everything up anew every time. No Atom modules already exist, btw. Personally I still think it would be easier all round for Atom to use RDF/XML, but that doesn't look like it's going to be an option. This approach stays close the XML syntax while leveraging the RDF model, and so as compromises go, it's pretty darn cool.

[Bo] If this is just a mechanism for slipping RDF in under the radar than, as mechanisms go, fine--if that's what people want that's what they'll get. I would actually rather Atom use formal RDF than establish a convention which is kinda-like-but-not-quite RDF. That just seems like an invitation for all sorts of confusion. That's why I consider 'almost RDF' to be worse than RDF itself. I don't even think you can enforce rules like '[change] all elements with URI content into elements with @ref attributes' at the schema level so really, you're back to square one. And, since no namespace is defined, how is an aggregator even supposed to determine that the EF is in use? Should it just look for the presence of 'ref' and 'id' elements and make the assumptions? This seems like a total parse and pray operation and again, in my experience, standards based on 'conventions' are almost always useless.

[DannyAyers] Bo - I believe the idea is for this to be how Atom itself is represented, rather than any transformation (I could be wrong). A few minor changes to the 0.2 snapshot syntax and Atom complies with the framework. (Even without EF I think this style is probably a good idea, it would make it easier to distinguish between URIs and data e.g. when using XSLT). This would also mean that no other namespace would be needed - this *is* Atom. It's clear that you have a general downer on the idea, but what do you suggest as an alternative? Do nothing?

[Bo] Danny - Actually, I think the point is that any Atom feed, regardless of the modules (man, I hate that word, that word is really half the problem!) present in the feed, can be mapped to RDF. If this framework is just a not-so-formal data model for Atom itself which has the benefit of being easily mapped to RDF than go for it. I don't think this EF is really that bad (though I still question introducing elements into an Atom feed that cannot be enforced by a schema). Really, my only concern here is that Atom remain as simple as possible. Other people might value extensibility and some people want expressiveness but for me the most important aspect of Atom is that a developer can skim a feed and churn out an aggregator and generator in an hour's time (if there are few rough spots here and there than let her consult a schema). So I question whether adding yet another layer (in this case a whole data model) to Atom and its module will really be worth it in the end. The (non-1.0) RSS formats have gotten plenty far without such extra layers. Also, my comments might come off as a bit harsher than they really are. I don't consider the EF a show-stopper. If it's what people want than fine, let's do it and move on. What I'd really like to see most of all is an Atom 1.0.

[AsbjornUlsberg] Bo, you say you want Atom to be simple. I think simplicity comes through unity. If we define a set of rules on how the Atom model is constructed, these rules will also define how extensions should be. Having unity in both the Atom core and all extensions will make Atom (with its extensions) easier to understand, and not to mention; write. If you know the rules, you can write a feed in notime, and even so; you can write an extension that conforms to the ExtensibilityFramework in a swiffy.

[HenryStory] I think it is wrong to think of the id as being an identity construct. I have written a synopsis of why I think this is so in Re: "Role of RSS in Science Publishing": Atom is in RDF format on the atom mailing list.

The framework is being formalized at ExtensibilityFrameworkGrammar.