MultipleSyntaxDiscussion

Have we reached consensus on this issue or does this remain an OpenPoll?

Use straight XML: TimBray, GeorgBauer, DannyAyers, TimothyAppnel, AsbjornUlsberg, TomasJogin, NormanWalsh, MattMower, SimonFell, ChrisWilper, DiegoDoval, RichardTallent, DareObasanjo, DaveWarnock, PhilLeif, DylanMoreland, MarkDrago, BillHumphries, LachlanCannon, MarcusCampbell, SimonJessey, ChristianRomney, HenriSivonen
Define a data model of which XML is one of many serializations: JamesSnell, AredridelStewart
Define a data model of which XML is the only serialization: MichaelBernstein
Prefer a syntax-independant model, but if almost everyone wants straight XML, then fine: DannyAyers, MarkCidade, MortenFrederiksen, JeremyGray, RichHall

The creation of a syntax will involve the definition of a base schema and a number of profiles that describe how PriorArt will be leveraged.
Limited multiple syntaxes should be supported. E.g. a simple text syntax and an XML syntax. Basing both on the same InfoSet model will allow us to easily define transforms.

[TimBray RefactorOk] I disagree fairly comprehensively. The serialization syntax's only officially-recognized form should be XML as defined in XML 1.0, i.e. a stream of Unicode characters with embedded angle brackets. XML is kind of ugly and kind of clumsy and kind of hard to generate, and the reason we do it anyhow is that it buys interoperability, but only at the syntax level. It is not OK for you to send me a binary glob and say "I'm playing by the rules because if you use my proprietary API you'll discover an Infoset in there."

[GeorgBauer RefactorOk] I second Tims statement about XML as base format. If the new format is about guranteed interoperability, a multi-representation-format wouldn't buy the goal but would hinder it. Actually I think it should be a fully XML based format with as much formal specification as possible to prohibit misunderstandings about what validates and what not. People who would like to use different representations can allways use XSLT to transform the data.

[DannyAyers RefactorOk] I'd go further in James' direction on this. I think the data model should initially be constructed in an entirely neutral form, from which serializations can be derived - something like UML or (better still considering we're talking web stuff) OWL. What's the point in deriving a generally-applicable model and then locking it to a single syntax? The reasons Tim gives actually add to this argument (XML is kind of ugly...it buys interoperability, but only at the syntax level). We can have interop at a higher level than that for no extra effort, and still have a standard XML representation with DTD, XML Schema and Relax NG schema etc. Misunderstandings will be easier to prevent (almost by definition) by working at a semantic rather than syntax level. As far as the benefits go, it doesn't have to be either-or. A whole load of possible work duplication could be avoided - for example, a standard mapping to SQL database tables could also be derived easily from a more abstract model than working from a specific XML syntax. GeorgBauer's point that people that want different representations can use XLST is valid, but I think it would make for a considerably tighter standard going from the top down together, rather than going straight down one narrow track and having people go off side-to-side from there.

Note also that in the Related section talk is of typed links, with the typing being defined through namespaces. How can compatibility with 3rd party systems be achieved (even existing ones such as RSS 2.0, TrackBack and the Blogger API) if everything has to be done at the syntax level? There is no guarantee that the typing is expressed in a syntax-compatible fashion.

I certainly agree that the primary syntax should be simple, vanilla XML. I also think that whatever happens it will be essential for Semantic Web compatibility to also provide a standard RDF/OWL mapping (to the RDF model, not RDF/XML).

[KenMacLeod] See also RdfAndEcho.

[TimBray RefactorOk] I'm sorry, but this is my single hottest button and I'm going to be persistent and ornery. Danny Ayers says that "Misunderstandings will be easier to prevent by working at a semantic rather than syntax level." I couldn't disagree more. If we agree on a syntax, with a formal decision procedure (aka validator), then we have a crystal-clear basis for interoperation. I have seen decades of attempts to build interoperable frameworks on the level of data models and APIs and so on, and if you want interoperation, you must specify the bits on the wire. There's just no other way. Look at the worldwide telephone system, the Internet, email, the Web, in fact any scaleable networked system where you can plug in any hardware or software, and in every case interoperability is defined at the syntax level. The data model you need for your aggregator may be (almost certainly will be) different from the optimal data model for my authoring system and different from that for someone else's query engine. But if we can agree on syntax, we can still interoperate. And if we can't, we can't.

[TimothyAppnel RefactorOk][AsbjornUlsberg] I completely agree with Tim on this.

[TomasJogin RefactorOk] I agree with Tim Bray. Is this New Format supposed to be simple or what? Or will "tools save us"? A single XML-based syntax -- in my opinion -- is the only feasible solution if we want this to gain widespread acceptance and use. RSS is simple, that's why it's popular. This New Format not only has to be as good and simple as RSS, it has to be even simpler and even better. Learn from HTML and XML.

[DannyAyers] (Refactor *not* ok for now as someone just interpreted that as 'deleteme') This New Format needs to be as simple as possible whilst still covering all that is needed. Syntax alone isn't enough for interoperation - your 'title' may mean something different from my 'title' (conversely your 'title' might be the same as my 'titulo'). Tim's examples are red herrings - they all have shared semantics too. But all I'm suggesting here is that the syntax is derived from a well-defined model, rather than the syntax being everything in itself. Definitely specify the bits on the wire, but back it up with a formal definition. I've a feeling whoever deleted my last comment did so because I mentioned RSS 1.0 - but what I'm saying here is that if the language is properly worked out, there will be no need for an RDF/XML representation. Tim's point re. different data models internal to applications - yep, sure, but it would be far easier to derive these from a common shared model than to have to reverse-engineer the syntax.

Let me see if I get this straight; you're not advocating several different syntaxes but rather a semantic specification besides the syntaxical? [TomasJogin]

In effect yes - the consensus seems to be that there should be a single, core syntax, and I'll bow to that. But it occurs to me that for this system to be interoperable with other (prtcularly non-XML) systems then the data model needs to be represented in a reassonably formal fashion prior to/at the same time as the serialization format. Figuring out the model and then creating the syntax without including connection a between would strike me as missing something potentially very useful. Rather like surveying the area between London and Paris, working out a route between them but *not* making a map.

[NormanWalsh RefactorOk] I think data models and syntactic neutrality have their place, but this isn't one of them. We have a bunch of variant syntaxes now and one could probably argue that they describe the same (or some) underlying data model. That's the problem, not the solution. To echo what TimBray and TimothyAppnel have said, it should have as its core a simple XML vocabulary. That vocabulary must be extensible and that pretty much means namespaces, I think. While it would be possible to have the core vocabulary in no namespace and put extensions in namespaces, I think that's probably a bad idea.

[MattMower RefactorOk] I agree with TimBray that the syntax should be XML and the spec should require it for interoperability. I also agree with SeanMcGrath that we should avoid namespaces. I confess that there was much I didn't understand about the xml-dev discussion of namespaces but I did not miss the problems. As Sean points out there are simpler ways to ensure disambiguation that don't require the overhead of namespaces. What else do we need namespaces for?

[DannyAyers RefactorOk] I'm won over on the single-syntax point, though I still reserve the right to say that having a data model is a good idea Sean's approach is using namespaces, just not standard ones. If anything the overhead would be greater, you'd need to microparse the element names and you'd have to redefine the notion of validity to allow practically arbitrary element names. XML namespaces is a convenient approach - it's following standards (good for interop) and capable parsers are available in every major language.

[SimonFell RefactorOk] I'm with Tim, stick to a single serialization format, of XML 1.0, multiple serialization formats is just going to make interop harder.

[ChrisWilper RefactorOk] Defining a data model doesn't mean multiple syntaxes. I am all for one syntax, but I think a data model should come first. Issues like "Should we use attributes" and "What should we name this element?" should come after a data model is agreed upon and terms are defined. If you do syntax at the same time, semantic arguments will come up during what should be a purely structural discussion.

[DavidJanes, RefactorOk] (I moved this to MultipleSyntaxDiscussion from DontUseXml, since I think we should use XML, then edited for brevity!) While I agree with most of the points Tim Bray is making -- XML, despite all of its flaws, is the way to go -- there are instances where other formats may be necessary. Define everything in terms of XML, then if it's necessary, allow that spec to be morphed into other syntaxes.

[AsbjornUlsberg] +1

The question you are asking is: when might it be necessary to use a different format. The answer: in places that XML cannot be used. For example, there are blogging tools that cannot generate correct XML (*cough* blogger, up till a few days ago *cough*) or cannot generate an "alternate index" (i.e. an index.rdf file). The specific instance I've been working on for the last six+ months is the case where I need to recognize a blog's content, and the only thing I can alter is the blog's template. It goes without saying that most templating languages cannot generate anything approximating correct XML.

You can look at my blog's content to see an example http://blog.davidjanes.com to see examples of what I'm talking about, plus there's an out of date spec here http://www.blogmatrix.com/docs/tr/WMD plus a related spec for embedding structured metadata in HTML 4 http://www.blogmatrix.com/docs/tr/QSM. The WMD spec (ignore the GEO stuff, it should be in a separate spec) should look very familiar -- it's in the same ballpark as what you're discussing here. Recent conversations with Bill Kearney have me wondering if I should be using a more XMLish tag format, but that's for further discussion.

This format has been influenced by RSS 3.0, promoted a few comments back by Aaron Schwartz.

[DiegoDoval RefactorOk] I second Tim's comments. Use only XML, it will simplify interop and avoid having to re-do a lot of the work that has already been done for XML to ensure that documents are cross-platform, support i18n, etc.

[RichardTallent RefactorOk] Another vote for Tim, real XML may get ugly when human-read but it is easy to consume. I disagree with DavidJanes: blogging tools will have to be rewritten to support Echo, or their contents will have to be transformed via XSLT or other means. This spec won't take off in popularity if tool developers have to handle "special cases" for which they can't use their favorite XML library to parse the entries. I'm also in the boat for namespaces, it's the only reasonable way to handle future extensions without creating an incredibly complex standard. For example, recently various blog posting applications have added the abilty to tag an entry with the music being played by the author when it is posted. Unfortunately, there is no decent place in RSS (or Echo?) to place such metadata except the entry body, yet full XML tags taken from the music file (MP3, Ogg Vorbis, etc.) is presumably more useful than, say, plain text or an HTML link to Amazon. This makes it hard for tools to either filter out such entries from being "related" to the artist or create neat features such as finding users with similar musical tastes, or for your favorite media player to automatically start up the tune when you read the entry (if you have it in your own collection). Meanwhile, GeoLocation is given a place at the table--arguably a more important chunk of metadata for some use cases, but certainly not for all--even though multiple standards (GML, NAC, etc.) exist anrialmpete.

[SimonJessey RefactorOk] I agree with what Tim Bray has said. Using only XML makes interoperability much easier. I assume that allowances can be made for the slight differences in XML 1.1, when that moves to Recommendation status.

CategoryArchitecture, CategorySyntax