Tim Bray: Any standard that tries to constrain the way in which data, once received, is processed, is broken.
HTML5 is clearly broken by Tim’s definition. And while it may go too far in places, I can say that there are definitely many areas where that definition is a good thing. I wouldn’t have agreed with that statement a few years ago, but I do now. Enthusiastically. But to explain why, I need to first back up.
Tim Bray: Any standard that tries to constrain the way in which data, once received, is processed, is broken.
All generalizations are false, including this one.
OK, that was too easy. So, let’s try again: HTTP is “broken” as it constrains what GET means vs, say, POST or PUT or DELETE. Without such constraints, web crawlers simply wouldn’t be possible. But perhaps Tim’s rule can be fixed by saying that there is an exception for headers vs bodies? Nah, let’s not go there. Lets go find an example that deals with content.
HTML5 is clearly broken by Tim’s definition. And while it may go too far in places, I can say that there are definitely many areas where that definition is a good thing. I wouldn’t have agreed with that statement a few years ago, but I do now. Enthusiastically. But to explain why, I need to first back up.
Like most people, I learned HTML via view-source. I learned how to produce tables by looking at examples. Such as this one. Tables have rows, rows have data cells. One can use a th
elements instead of td
elements when you want headers. I’ve seen some people replace the tr
elements with thead
elements, but it didn’t seem to make much difference, so I didn’t always follow that practice.
People who have learned by viewing my source may have learned similar lessons, I guess.
I turns out that this is wrong. And is wrong in a way that affects plugins like tablesorter. Tables have bodies, and possibly heads and foots, and captions and whatnot; but tables have no rows, at least not as immediate children. And if you don’t include tbody
elements, the browser will insert the necessary elements for you (assuming you are using text/html
like any sane person would). This means that things “just work” even if you learned the lesson I “learned” and tried to use this plugin.
Of course, this plugin didn’t work for me, at least not at first, as I didn’t include initially include tbody
elements and do serve my content as application/xhtml+xml
. No biggie, easily fixed.
Now lets look at this from a browser vendor perspective. If you want things to “just work” you need to know this. And this is just one small example. There are many more, and they lead to an alternate postulate, namely:
If what you want is interoperability, a DOM, and JavaScript, then you need the mapping of stream-o-bytes to a tree-structure to be completely well defined.
This conclusion continues to be controversial, but as I indicated, I have been convinced. And furthermore, as my tbody
example shows, this goes well beyond well-formedness. It has to do with validity too.
And, again, I won’t dispute the possibility that there are other areas where HTML5 has gone too far. But even if a long list of such areas is produced, you can’t prove a negative with any number of examples.
Therefore, the more interesting question is:
When are constraints useful for achieving interoperability?
P.S. HTML5 allows tr elements as direct children of table
elements. It is quite instructive to look at under which conditions it allows such.
I’ve seen some people replace the tr elements with thead elements, but it didn’t seem to make much difference, so I didn’t always follow that practice.
Actually TR goes into THEAD. And THEAD is really useful to have the column titles printed in every page when you print a table that does not fit in one page.
OK. pwnd.
BTW. I learned HTML5 by looking at your source. Then I will turn to specs when I feel I need more in depth knowledge. Just like I did with HTML4. Thus I value more view-source than strict compliance to specs.
the table example isn’t a great one
You haven’t said why you feel this way
Javascript makes assumptions on how the markup is processed
Javascript itself doesn’t, but when JS+DOM+HTML are combined, you either get poor interop or you get undocumented consensus or you get documented consensus.
pwnd
lol :-)
I value more view-source than strict compliance to specs
The people authoring HTML5 are viewing a lot of source, and what they see informs what is in the spec. This leads to a lot of questions for which there are no clear answers. Like, for example, should tr
element as an immediately nested child of table
be considered conforming?
HTML5 allows tr elements as direct children of table elements.
This is just a continuation of the validity rules from XHTML 1.0, which says:
“some XHTML elements may or may not appear in the object tree because they are optional in the content model (e.g. the tbody element within table). This occurs because in HTML 4 some elements were permitted to be minimized such that their start and end tags are both omitted (an SGML feature). This is not possible in XML. Rather than require document authors to insert extraneous elements, XHTML has made the elements optional. User agents need to adapt to this accordingly.”
and its DTD says <!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
. Looks like the differences are that HTML5 allows tfoot at the end of the table, and doesn’t allow col elements that aren’t in a colgroup. (Some other significant differences are that HTML5 defines exactly what rows and columns and cells actually are, and what headers are associated with each cell, in tables with theads and colgroups and overlapping cells and invalid DOM structures and so on, rather than leaving it for the reader of the specification to extrapolate from examples and from common sense. Those concepts are important even if you don’t have a DOM or CSS or JS at all.)
You haven’t said why you feel this way
Javascript itself doesn’t...
It frustrates me to no end that the HTML WG doesn’t bother to standardise declarative markup that will make sorting tables in JavaScript obsolete, as well as disallowing extensions like hInclude that provide real improvements, thereby making my home page invalid (yet still perfectly functional, thank you).
/whinge
Of course they disallow things that aren’t spec’ed yet. :-)
And even things that are spec’ed, the validator may not have coded the support yet... though it is open source and contributions are welcome.
Also note that extensibility is an open issue, though to date that is of generally of the type of “I want more”, not less — and it is worth nothing that such requests to date have been met with strong resistance.
I agree. It makes no sense that different consumers would be allowed to derive different data models from the same syntax. (With the exception where the data model is a subset of the spec’d model.) This is because what we want to communicate is the data as a model, but we have no choice but to convert it to and from a syntax.
Once the data as a model is derived from the data as syntax though, the consumer should not be constrained in how it proceeds.
Mark, besides what Sam said, it seems like hInclude could be recast to use data-* attributes. In that form, it would be conforming without the need to spec it or update validators. Effectively, the <hx:include> element you propose just gives some instructions to a client-side processing library.
Sam is correct that we will also look at other ways to add extensibility.