It’s just data

The White Pebble

Ian Hickson: Regarding your original suggestion: based on the arguments presented by the various people taking part in this discussion, I’ve now updated the specification to allow “/” characters at the end of void elements.

This is big. PHP’s nl2br function is now HTML5 compliant. WordPress won’t have to completely convert to HTML4 before people who wish to author documents targeting HTML5 can do so using this software. Such efforts can now afford to proceed much more incrementally. This is much more sensible and practical possibility.


To illustrate the larger context, consider that the universe of documents bodies which were simultaneously both valid HTML5 and valid XHTML5. Only a few days ago, such bodies could not include any images. Now they can.

Now the remaining differences amount to a few edge cases, and a restriction that exists in the current draft but is in every way as meaningless in real life as the prior prohibition against trailing slashes in void elements. And every bit as provisional: the statement “xmlns attributes on <html> elements are disallowed in HTML 4 and in the WHATWG draft for HTML 5 as it exists on 1 December, 2006” has precisely the same validity as “closing slashes are disallowed on <img> elements in HTML 4 and in the WHATWG draft for HTML 5 as it existed on 29 November, 2006”.

Modulo this one arbitrary — and frankly artificial — difference the effective overlap between “pure” HTML and “pure” XHTML has been greatly increased. This means that people can incrementally evolve towards one or the other — if they should chose to do so.


So now that pebble has been cast, the landslide is sure to follow. The right questions are already being asked. And Ian’s weak joke concerning atheism is already backfiring.

The truth is that most HTML is authored by pagans. Ones who don’t understand arguments such as these which amount to stating that the meaning of your document can only be interpreted in the context of some knowledge that doesn’t exist in this universe at all, as it only exists in another plane of existence entirely. Only high priests with AllowOverride FileInfo credentials are permitted to speak to these gods. Which would be fine, if the only difference between Thor and Zeus were that one is forgiving and the other is vengeful.  And if these magic incantations could be trusted to work.

Unfortunately, in the real world, they often don’t.  Futhermore, the fact is that these two gods will judge your documents differently. They will produce different DOM trees for documents such as the XML specification based on how it is served. And, ironically, the XML specification is served as text/html.

This is a exceedingly subtle point. One that unfortunately does not leap out at you in the existing WHATWG document.

I believe that for HTML5 to be more than an intellectual exercise, it needs to include the pagan view. One that, in the final analysis, is a much simpler one. Pagans are like that.

Pagans might understand the notion that there are two authoring formats if one were, say, based on S-expressions and the other were based on XML. But we are talking angle brackets vs. angle brackets here. Where neither the element names, nor even (generally) the case of those names change. To a pagan’s untrained eye, such documents are indistinguishable.


In the pagan world view, there are documents that are HTML, and there are documents that are XML, and the overlap is called XHTML. In this view, there is a preferred MIME type for “simply HTML”, and a preferred mime type for “simply XML”, and a preferred mime type when you feel the urge to affirmatively declare that your document is both.

In this world view, if you take a document which targets this overlap, a conformance checker for HTML5 would identify one set of errors. Another conformance checker for XML’s well-formedness constraint would identify a possibly different set of errors. What truly would be surprising to such a pagan is for a conformance checker which simultaneously targets both to identify less errors than the union of the two. If an empty anchor tags trigger parse errors in HTML5, then by &deity; it should trigger the same parse error in XHTML5, no?


When all the religion was stripped away from the trailing slash in always-empty HTML elements discussion, only one question remained: I think basically the argument is “it would help people” and the counter argument is “it would confuse people”. This is a eminently sane way to approach discussions such as these.

I would argue that it would both help people and reduce confusion if a void <a/> element continued to be invalid HTML5 and, by implication, be invalid in XHTML5. By invalid, I simply mean that a parse error would be reported by a conformance checker whenever such constructs are found in a document. Non-draconian user agents can, of course, chose to recover from this error.

The HTML5/XHTML5 specification can detail the different recovery rules for this parse error based on whether the document is being parsed in HTML5 mode or XHTML5 mode. There are ample historical reasons for this divergence, and I’m certainly not suggesting that they be changed — merely that they be documented.

And, in a somewhat ironic twist, people will find that xml parsers won’t halt on this particular parse error. They simply will silently produce the “wrong” DOM for this invalid document.

The only realistic alternative? Don’t document this difference in behavior. Leave it as an exercise for the student. The prevailing opinion on the WhatWG working group seems to be that the XML serialization is “free” in that somebody else has already done the work. I will counter that it is only free for spec designers. It certainly isn’t free to implementers who must implement two parsers with two test suites and deal with two sets of bugs. And it certainly isn’t free to authors who much deal with the uncanny valley and cognitive dissonance implications of this needless split.


To commemorate this occasion, I’ve gone and updated planet intertwingly to use the (X)HTML5 doctype.

I’ve also gone ahead and created a small SVG icon for WhatWG, one that I can use in place of the comparatively bloated PNG image.

It is my hope that someday a pagan will take a fancy to one of my icons, will view source, and proceed to copy and paste said icon into their CMS.  And when it doesn’t work as expected, they will proceed to file a bug report.

Meanwhile somebody who is entirely a-political and working on a browser feature to replace the graphics substrate will decide to humor this pagan.  The other browser vendors will then shortly follow suit.