It’s just data

XML 2.0?

Anne van Kesteren: The time has probably arrived to define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it. Perhaps we can drop this internal subset thing in the process.

I’ve been slowly but steadily prototyping this in the html5lib svn repository.  Since this post, I’ve added both W3C DOM support and can produce SAX2 events (with or without namespaces) from that DOM.  A SAX w/namespaces interface will make it easier for me to replace sgmllib as the fallback in the event of XML parsing errors in the Universal Feed Parser.


XML 1.7 would be fine, thanks.

no dtds
no default namespaces, less wordfudging around xmlns
no qnames in content (as ws-* is expiring, I assume no-one will need this in 2010)
a processing model for inherited/assumed attributes
a processing model for empty v not present (good luck with that)
mI defaulting for foreign markup
hixie as an invited expert

Most crappy XML results from string concat/interpolation (I’m ignoring encoding). Why not allow publishers to mark up elements as 'nofail'?

Posted by Bill de hOra at

no default namespaces

WTF?

Posted by Sam Ruby at

Anne van Kesteren: The time has probably arrived to define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it. Perhaps we can drop this internal subset thing in the process. (Read on Source)...

Excerpt from Megite Scobleizer News: What's Happening Right Now at

Sam Ruby: XML 2.0?

wearehugh : Sam Ruby: XML 2.0? Tags : dracos standards xml...

Excerpt from HotLinks - Level 1 at

No DTDs, no namespaces and no URIs used for anything syntactic, no processing models, no working group, no standards organization, no mailing list, no wiki, no blog, no trackbacks, no +1s (do people know how stupid that is?), no conferences, no meetings, no telcons, no charters. Do allow decentralized extensions.

Posted by Robert Sayre at

links for 2007-01-27

Googlebombs Defused? interesting mostly for the list of attempted googlebombs (tags: google) XML 2.0: XML with graceful error handling? - Anne’s Weblog (tags: xml standards dracos) Sam Ruby: XML 2.0? (tags: xml standards dracos) Results of mobile...

Excerpt from dive into mark at

“WTF?”

Here’s an exercise: take the namespaces recc and remove the default namespaces feature. See whether you think the document is technically better or worse, more or less ambiguous.

“no +1s (do people know how stupid that is?)”

Hi Rob. Nice to see you haven’t lost any of your charm.

Posted by Bill de hOra at

no working group, no standards organization, no mailing list

Just because you’ve been kicked out of every working group you’ve ever disrupted — to the point that the IETF felt compelled to write an RFC explaining how to deal with difficult people who disrupt working groups — doesn’t mean the format is broken.

Posted by Mark at

“WTF?”

Dammit Sam Ruby, I was about to start on my weblog code, and you stopped me cold with an ugly fact. My feed is now sane. I didn’t use atom: as a prefix (i’ll be curious to see if I get any error reports from downstream).

Back to work.

“ Nice to see you haven’t lost any of your charm.”

No need for that I suppose. Sorry Rob.

Posted by Bill de hOra at

My feed is now sane. I didn’t use atom: as a prefix (i’ll be curious to see if I get any error reports from downstream).

You can obtain a partial list of the clients that are now incapable of reading your feed here.

Posted by James Holderness at

Bill,
I have to agree that default namespaces is probably one of the worst features of the XML namespaces spec [the worst being namespace names being URIs instead of URLs]. However I don’t think it would have been politically feasible to do anything else due to requirements from XHTML [if memory serves me correctly].

James,
  Interesting test cases but they don’t really show which readers have problem with namespaces in XML document, it shows a problem with passing around XML fragments [and the correct XML namespace context] with those fragments. Most of the readers including RSS Bandit, are just sending the contents of atom:content to the browser without fixing up the namespace context. Which means you end up with some HTML and an island of XML with funky tag names (h:div, h:li) in your reader. Funny enough, this is an example of where default namespaces make thing work “as expected” most of the time.

Posted by Dare Obasanjo at

Hi Mark,

I am not sure why that brief opinion on the necessity of a WG and trappings made you feel the need to attack me personally, but I am not surprised.

I have been kicked out of the Atompub WG before, and I don’t regret it. The rest of your comment is incorrect.

It seems to me that the idea is good enough to start implementing and using it, and then producing an RFC as a by-product. This how JSON went, and it worked pretty well.

Posted by Robert Sayre at

What is it about angle brackets that makes people start insulting each other?

Posted by Jesper at

It’s the pointy corners, Jesper. I reckon we should just move everything to S-expressions, with their soothing round parentheses. Then we could be free of these bitter, divisive arguments about markup forever.

Posted by Adam Fitzpatrick at

Ah, yes, replaced by bitter, divisive arguments about LISP.

Posted by Jesper at

Sam Ruby: XML 2.0?

[link]...

Excerpt from del.icio.us/wearehugh at

Interesting test cases but they don’t really show which readers have problem with namespaces in XML document

I’m sorry, I should have been clearer. I was referring specifically to test case 1 (Atom namespace mapped to a prefix). While RSS Bandit handles that quite happily, there are many aggregators that do not.

Posted by James Holderness at

No matter how XML’s error-handling rules are refined, there will be some XML documents where the author intends the document to be interpreted as A, the rules state the document should be interpreted as not-A, and clients that interpret the document as A will win market share over clients that follow the rules.

This will be true for as long as (a) there are widely-used tools that generate XML in error-prone ways, e.g., by string concatenation, and (b) there are widely-used clients for reading XML whose users expect to see nicely formatted text in a browser.

Posted by Seth Gordon at

Seth: the statements you make are true, even for well-formed documents produced by serializing a DOM.

However, the conclusion you draw doesn’t follow.

What the WHATWG is doing is looking at the problem from the other end: what are browsers doing?  Can we write that down so that others who wish to consume HTML can at least do it consistently with what the browsers with marketshare are doing? 

I also believe that what the browsers are doing captures years of experience of how best to deal with issues like character encoding issues, unescaped ampersands, unmatched quotes and the like.  Many of these same issues apply to poorly produced XML.

Posted by Sam Ruby at

What the WHATWG is doing is looking at the problem from the other end: what are browsers doing?  Can we write that down so that others who wish to consume HTML can at least do it consistently with what the browsers with marketshare are doing?

To the extent that browsers have converged on a certain behaviour, it makes sense to standardize that, so that others do not have to reinvent the wheel.

Since HTML-producers are conditioned to expect the behaviour of the dominant browsers (and have correspondingly adjusted their error-laden content), there’s very little competitive advantage to be had in trying to “do better.”

But there clearly isn’t any such consensus in the XML-parsing world. Which makes the argument for adopting a particular error-handling behaviour in XML much less compelling.

Posted by Jacques Distler at

Hmmm. It seems the old OpenID server works. I guess I’ll have to find an OpenID client, for MovableType, that doesn’t suck...

Posted by Jacques Distler at

But there clearly isn’t any such consensus in the XML-parsing world. Which makes the argument for adopting a particular error-handling behaviour in XML much less compelling.

I will point out that both HTML and XML tend to be produced using similar processes and therefore tend to produce similar errors.  Enough so that the Universal Feed Parser, upon which Venus is based, falls back to an SGML parser if processing with a “real” XML parser is unsuccessful.  While that process consistently produces demonstrably good results, I’ve looked at what has been defined for html5, and it (with a few minor tweaks) would produce even better results.

And of course, there’s the distributed extensibility that only XML at this point can bring.  There is no way at this time to define new grammars in HTML, like MathML and SVG.  And the prospect of requiring draconian error processing to the entire page as a pre-requisite for embedding even the smallest amount of either has proven to be a non-starter.

Posted by Sam Ruby at

I just hope that if there’s an XML 2.0 that has WHATWG style error handling, that people implement it and use it widely enough that XML 1.x becomes like Netscape 4.x... no need to support it.

Posted by Devon at

Anne van Kesteren suggests an XML 2.0 mostly defined by less-Draconian error handling, provoking further discussion over chez Sam Ruby....

Excerpt from ongoing at

no default namespaces


I’m for keeping default namespaces, but changing the interpretation of un-prefixed attributes to put them in the default namespace (rather than no namespace). To enable omission of prefixes for attributes within elements that aren’t in the default namespace, we could make a colon (with no prefix before it) shorthand for “same namespace as the parent element”. For example:

<myprefix:foo :bar="asdf" />

could be used as shorthand for:

<myprefix:foo myprefix:bar="asdf" />

That would make for more straightforward procedures for interpreting attributes, and even more so, for adjusting prefixes when combining documents.

Posted by Antone Roundy at

“but changing the interpretation of un-prefixed attributes to put them in the default namespace”

I’ve seen that done in practice; it doesn’t work out. It means you can’t compose markup safely due to scoping issues.

“To enable omission of prefixes [...] could be used as shorthand for:”

I think inventing shorthands and conveniences and assumed values and is one of the problems with XML in the field. It always causes problems because markup/declarative types don’t think through the consequences of acquisition and inheritance semantics and lexical scoping the way software types do.

Posted by Bill de hOra at

“but changing the interpretation of un-prefixed attributes to put them in the default namespace”

I’ve seen that done in practice; it doesn’t work out. It means you can’t compose markup safely due to scoping issues.

Could you give a concrete example of what you mean? I don’t see how it’d be any different from having a default namespace for element names. Even for element names, you always have to know which namespace name is associated with which prefix (or no prefix) in the scope in which the markup is being composed to avoid problems. And you always have to take care to carry the namespace declarations with you when copying markup from one context to another.

“To enable omission of prefixes [...] could be used as shorthand for:”

I think inventing shorthands and conveniences and assumed values and is one of the problems with XML in the field. It always causes problems because markup/declarative types don’t think through the consequences of acquisition and inheritance semantics and lexical scoping the way software types do.

As a “software type”, it seems pretty straightforward to me:

if (AttributeNameDoesntContainColon())
  ApplyDefaultNamespace()
else if (AttributeNameDoesntHavePrefix())
  ApplyParentElementNamespace()
else
  ApplyNamespaceSpecifiedByPrefix()

...which seems at worst no worse than the current state of things:

if (AttributeNameHasPrefix())
  ApplyNamespaceSpecifiedByPrefix()
else
  AttrIsntInAnyNSInterpretUsingContextProvidedByParentElement()

...which I wouldn’t have so much problem with (a little, but not as much) except for two things:

1) Given that attributes sometimes are prefixed and thus are in a namespace, and given that default namespaces exist for elements, its not intuitively obvious to someone who hasn’t read and correctly understood the specs that unprefixed attribute names aren’t in the default namespace.

2) In the following example, the same namespace has to be referenced twice if it is to be used as the default namespace:

<foo xmlns="tag:a" xmlns:taga="tag:a" xmlns:tagb="tag:b">
  <tagb:bar taga:asdf="qwerty" />
</foo>

Posted by Antone Roundy at

Add your comment