Anne van Kesteren: One of my side projects is XML5. Earlier this year I suggested the idea as XML 2.0, but in line with recent “jokes” about HTTP5, SVG5, and CSS5, XML5 makes perfect sense. The idea of XML5 is to provide a revision of XML 1.0, XML 1.1, Namespaces in XML 1.0, Namespaces in XML 1.1, and RFC 3023, that is backwards compatible and introduces HTML-like, although much more sane, error recovery.
Question: should XHTML5 be based on XML5 or XML1?
Warning: brainstorming ahead. Don’t groan.
Anne van Kesteren: One of my side projects is XML5. Earlier this year I suggested the idea as XML 2.0, but in line with recent “jokes” about HTTP5, SVG5, and CSS5, XML5 makes perfect sense. The idea of XML5 is to provide a revision of XML 1.0, XML 1.1, Namespaces in XML 1.0, Namespaces in XML 1.1, and RFC 3023, that is backwards compatible and introduces HTML-like, although much more sane, error recovery.
Question: should XHTML5 be based on XML5 or XML1?
Warning: brainstorming ahead. Don’t groan.
Supposed HTML5 defined not one, and not two, but three serializations. The first one would be identified by the MIME type of text/html
. The second one by application/xhtml+xml
. The third one by a MIME type of text/html; subtype=xml
.
General purpose browsers like Mozilla could support all three. Special purpose browsers may choose to only support fewer parsers. In particular, applications that require streaming support may chose to not implement the first type, and truly micro browsers may chose not to implement the first two (accepting some loss of fidelity in rendering some web pages).
Those that wished to include SVG or MathML inside of otherwise valid, but not well formed HTML pages, could do so with either a minor change to the MIME type or the addition of a <meta>
tag. IE would continue to ignore these elements, but at least authors wouldn’t have to do heroic acts in their .htaccess
files any more. And should IE ever wish to join the party, Microsoft would have the opt-in switch that they were looking for.
And would documents conformant to serializations 2,3 be identical (except for the MIME-type)?
That is, serializations 2,3 would differ only in their parsing model (2 being parsed by an XML 1.0-compliant parser, 3 being parsed by an XML5-compliant parser)?
That would be a big improvement on the mess that is Appendix C.
And would documents conformant to serializations 2,3 be identical (except for the MIME-type)?
The devil’s in the details. For example, —
is legal is XHTML, but hasn’t proven to be very interoperable. I would hope that serialization 2 would be explicit about which predefined entity names are valid — perhaps even to the empty set as James suggests (I personally would allow the ones allowed by XML: &
, <
, >
, "
, and '
).
My idea was to no longer have XML 1.0 / XML 1.1 basically.
You expect every consumer of XML (from Sam’s Venus to the XML libraries in my favourite programming language) to convert to XML5 parsing?
Wow! You do dream big.
Also, I would actually like to introduce a bunch of new predefined entities from HTML and MathML.
Predefined entities are a nightmare, unless you control both ends of the wire.
There are 2200 named entities in HTML+MathML (plus whatever “new” ones you wish to define). Are you actually going to require that clients which don’t do MathML actually support all those entities anyway? In light of distributed extensibility, why should ∮ (and its 2000 friends) be grandfathered in?
My idea was to no longer have XML 1.0 / XML 1.1 basically.
It is one thing to say that a specific product, say Opera, would chose to treat application/xhtml+xml
as XML5; but quite another to say that XML 1.0 (and what little XML 1.1 there is) would no longer exist.
If we keep them there’s less of a win I think.
The position as I understand it of the WHATWG has basically been that it can’t prevent an XML 1.0 serialization of HTML5, so it might as well define it. That statement continues to be true.
I also believe that the current MIME type for application/xhtml+xml
is a big impediment, second only to the well-formedness requirement. Being able to gracefully degrade to text/html
for recalcitrant browsers is worth doing.
Also, I would actually like to introduce a bunch of new predefined entities from HTML and MathML.
I share Jacques’s concern. Basically at this moment, anybody who wishes to serve XHTML today and wants to work with Opera have already learned not to depend on any predefined entities beyond what is defined by XML. Even
is problematic. But as long as the results are well defined and interoperable, I’m OK.
IMHO HTML5 should stick to XML1.
HTML5 already has error-proof XML-ish mode that’s just fine for all those authors who think they can generate well-formed XML with echo() without going insane.
if XML5 comes along, we’ll end up with yet another serialisaton that looks like XML1, but you can’t rely on it being compatible with XML1 parser (like real-world XHTML ended up).