application/xhtml

In a matter of a weekend; rusty on COM, unfamiliar with the Mozilla codebase, class libraries, build process, trace facilities, test suites, and debugging aids; and therefore armed only with make, vi, and fprintf; I came up with this modest patch against Minefield HEAD.

This patch is merely meant as proof of concept, it certainly isn’t production quality code.  The right thing to do longer term is to refactor the common code out of nsHTMLContentSink and nsXMLContentSink.

What does this patch do?  It enables this document (served as text/html and with the proper META tag) to be treated as application/xhtml (note the missing +xml!), and therefore enabling the SVG diagram embedded therein to be displayed.

The limitations in this proof of concept are numerous.  xmlns attribute values are not inherited, and therefore must be repeated on each element.  XML’s empty element syntax is not implemented, therefore separate start and end elements are required for each element.  Non-element nodes are not handled, so no embedded text or comment nodes will make it into the DOM.

There even seems to be a slight timing/notification problem whereby the first time you display this not all of the red areas show up properly.  If this happens to you, simply press F5.

In any case, the intent of this code is to show what can be done in the hopes of spurring the discussion of what should be done.

Background

At the moment, HTML and XHTML are two different authoring formats covering roughly the same domain of DOM trees.  People like T.V. Raman of Google have worked hard on standards like XFORMS which, at the moment, prereq XHTML.  Due to this dependency, such standards (which also include MathML, SVG, and XBL) are incompatible with other Google features, like AdSense.

In theory, this should all work out.  Eventually the ramifications of draconian error processing will work its way through the tool chain.  Either that, or eventually those who are working on HTML5 will recreate an entirely new toolchain and entirely new grammars.

The problem with the former is that it won’t work.  The problem with the latter is that it won’t scale.

There has to be a third option.  We just need to want to find it.

Let the dialog begin.


the proper META tag

Sam Ruby: “There has to be a third option. We just need to want to find it.” Apparently, Sam has...... [more]

Trackback from franklinmint.fm at

So what exactly has changed since you suggested something like this last time? It’s also still not clear to me what you actually want.

Posted by Anne van Kesteren at

I also don’t relaly believe that many people are actually using “tool chains” or anything. I believe the average web author simply does string concatenation just like WordPress does. There haven’t been really any (good) HTML tools in scripting languages and yet the whole web is full of HTML. Lots of errors in it, sure, but that doesn’t really negate the point.

Posted by Anne van Kesteren at

QOTD

Sam Ruby:"At the moment, HTML and XHTML are two different authoring formats covering roughly the same domain of DOM trees."...... [more]

Trackback from Bill de hÓra at

So what exactly has changed since you suggested something like this last time?

The Mime type. And the META tag.

It’s also still not clear to me what you actually want.

Distributed extensibility.  Without implicitly requiring draconian error processing.

Posted by Sam Ruby at

I don’t think that inventing a new technology will solve the problem we already have. It will just create yet another code path we have to debug. I’d rather fix XML.

Posted by Anne van Kesteren at

Perhaps we should figure out how all non-drocanian XML processors parse XML and spec that?

Posted by zcorpan at

I’d rather fix XML.

Do you have a concrete proposal?

Posted by Sam Ruby at

Anne, if there are two tokenizer and DOM builder code paths (HTML and XML), adding limited namespace support to one of them still keeps the total code number of code paths at two. Also, I think your Web-colored view of XML is too narrow. Most uses of XML are outside the Web. For those uses, the interop of parsers is remarkable. Breaking XML for those uses would not be nice. If you want a non-Draconian XML, I suggest developing HTML5 into the direction of an alternative infoset serialization instead of shaking what XML is.

Posted by Henri Sivonen at

Henri, sure, but that’s not what Sam is proposing. As I understand it you would go into an alternate HTML code path when you see the media type application/xhtml or some meta element with special values. This means you have three ways to get to a serialization, instead of just two.

The reason for changing XML, at least as used on the web, is because we already have to deal with broken XML and that problem is not just on the feeds site. We get more and more issues with mobile content providers as well. This may be because other mobile vendors don’t really use an XML parser, but we have to deal with that anyway.

It seems better to define error correction for XML than just keep hoping this problem will eventually solve itself. Or perhaps abandon it all together and try to somehow solve the use cases in the text/html serialization.

Anyway, I’m more concerned with getting HTML (and CSS, DOM, CSSOM, ECMAScript) right than solving the XML issues.

Posted by Anne van Kesteren at

Henri, sure, but that’s not what Sam is proposing. As I understand it you would go into an alternate HTML code path

I don’t know what the code base for Opera looks like, but the code base for Mozilla has lots of “quirks” mode checks.  Calling a single if check in this context a completely alternate HTML code path, while technically correct, is quite misleading.

What I would like to see is one uncompromising and non-draconian infoset serialization.  One that can express the full range of valid DOM trees.  Coupled with a parser that can round trip this information.

Separate from that, there can be a quirks mode in which xmlns attributes are ignored (for sake of argument, call it text/html), and a <insert-your-favorite-adjective-here> mode for people who want the full expressiveness of XML and are willing to accept the draconian implications that comes with this (for sake of argument, call it application/xhtml+xml).

Anyway, I’m more concerned with getting HTML (and CSS, DOM, CSSOM, ECMAScript) right than solving the XML issues.

That’s cool.  I don’t ask that you invest a significant amount of time into this, I am merely trying to show that a few if checks can remove bottlenecks, allowing different groups of people to work in parallel on different (and quite possibly niche) solutions.

All that is required is a syntax in which the parser can fully populate the DOM...

Posted by Sam Ruby at

Opera doesn’t have quirks mode parsing as far as I know. HTML5 will hopefully end up without the need for quirks mode parsing too. I should add though that dealing with quirks mode is a major pain. Everything related to rendering effectively needs to be tested twice. As I understand it “browser developers” are slowly trying to move away from quirks mode or getting it properly written down somewhere because of that.

Also, with SVG and MathML which are probably the only languages which would qualify for direct embedding inside HTML there’s not really much local name overlap so other solutions can be developed for that (and have, in case of MathML).

Let me add to this that for your new serialization you need new tools for creation, because you want indicate in ways where this xmlns attribute will be placed. You also need new tools for consuming as it may be “non-conforming” which brings us back to creating a “HTML tool chain” whatever that may be.

Posted by Anne van Kesteren at

Can I ask what the goal here is (as opposed to implementation details in a particular browser)?

As I understand it, what you want is extensibility in HTML. XHTML achieves this by virtue of

The latter is relevant to the scalability of the solution, as no special error recovery (or other special parsing rules) is needed for the foreign content.

My understanding is that if a particular browser supports a particular extension (SVG, MathML, XBL, XForms, ...) to XHTML, you’d like it to be capable of supporting that extension in HTML (with the foreign content properly placed in the correct namespace in the DOM).

So, I guess my first question is:

The next question is about how you see triggering the handling of foreign content.

Obviously (?) you don’t want to go about making up bogus MIME-Types  for something other than demo purposes.

But, if your design goal is for your daughter to be able to drag an SVG fragment into the <body> of her LiveJournal page, it’s not clear that requiring a special <meta> element in the <head> of her page is workable.

Third, what is supposed to happen if the browser doesn’t support this particular extension?

Your two sample implementations are each impressive in their own way, but they’re not particularly compatible in what they purport to consume. So it might be helpful to clarify what it is that we’d like to be able to consume. And then we can have a good knock-down, drag-out argument about whether that will “break the Web.”

Posted by Jacques Distler at

Let me add to this that for your new serialization you need new tools for creation

What ever happened to "I believe the average web author simply does string concatenation just like WordPress does"?

In any case, that serialization was just meant as a starting point for discussion.

Posted by Sam Ruby at

So it might be helpful to clarify what it is that we’d like to be able to consume. And then we can have a good knock-down, drag-out argument about whether that will “break the Web.”

Patience.  There is quite evidently much resistance to the idea of distributed extensibility.  The only way I know of how to proceed is to one by one knock the legs out of the issues that are raised.  If you don’t, people will always be ready to say why “that won’t work”.  Over time, you collect up what is important (xmlns) and drop what is not (e.g., CDATA except possibly in the case of inline scripts).

Because of the way the web evolved, the default is always going to need to be for backwards compatibility to deal with cases like this (search for xmlns); but what I would like to build the case for is a mode which gives you clear benefits with minimal cost to everybody involved.

Posted by Sam Ruby at


Sam Ruby: application/xhtml

Patience. There is quite evidently much resistance to the idea of distributed extensibility. The only way I know of how to proceed is to one by one knock the legs out of the issues that are raised. If you don’t, people will always be ready to say...

Excerpt from Public marks with search ruby at


Sam Ruby: application/xhtml

hmm......

Excerpt from del.icio.us/connolly/xhtml+quality at

Add your comment












Nav Bar