Ian Hickson: If we truly want to make authors have better tools for making their content more compliant, a start would be having the W3C invest more genuinely in its validators. The W3C HTML Validator is one of the user agents that ignores the Content-Type header when it comes to HTML vs XHTML; filed as bug 1500 about a year ago, still unfixed.
Saying that the validator is somebody else’s problem only works if you don’t believe that the problem is important.
Ian Hickson: If we truly want to make authors have better tools for making their content more compliant, a start would be having the W3C invest more genuinely in its validators. The W3C HTML Validator is one of the user agents that ignores the Content-Type header when it comes to HTML vs XHTML; filed as bug 1500 about a year ago, still unfixed.
Saying that the validator is somebody else’s problem only works if you don’t believe that the problem is important.
If people feel that HTML 5 deserves a better validator, I’m willing to invest some time into coding. Are there others interesting in contributing to the coding, the writing up of test cases, or the authoring of documentation?
Another thing that will be needed down the road is somebody to host it. I host the Feed Validator and that gets plenty enough traffic as is, I can only imagine what kind of traffic an HTML validator would have.
I think the W3C Validator has a few other bugs that I continue to hit over the years. But I don’t think hosting should be that much of an issue. I’d be surprised if no one came forward. Perhaps the Web Standards Project would have some leads.
I’d be happy to assist with both test cases and documentation authoring. I still have test case files I built for IE5.5 way back when.
I’d love to write test cases as well as writing code for the validator itself, if it was open source and contributing was as easy as saying “pie”. I can’t help with the hosting, though, but I feel that the employers as well as W3C member companies of some of this blog’s readers (and even owner) perhaps could do something to help? :-)
Anyway, while speaking of content type and validators, why can’t we have one submission form (e.g. validation front page) for any kind of validation? If the validator can sniff what content type the resource at the end of a given URI has, it can then invoke the correct validator. application/xhtml+xml
invokes the XHTML validator, application/atom+xml
invokes the Atom validator, text/html
invokes the HTML validator, text/css
invokes the CSS validator and so on.
Unfortunately, validator.org is already taken by a domain shark, but we might perhaps think of a domain name suitable for this task that isn’t reserved just yet.
Sounds like a good idea. Henri Sivonen has already made a start at an HTML5 validator, though it may not be quite what you had in mind.
I’d be happy to help in any way I can, which would probably be helping write testcases and documentation, since I don’t know much about parsing HTML.
we might perhaps think of a domain name suitable for this task
validatr?
validatr?we might perhaps think of a domain name suitable for this task
Why do we need vowels at all? vldtr is both short and incredibly cool.
Why do we need vowels at all?
Says the man with the non-ASCII vowel in his name... ;)
validatr?
In the spirit of How to Name a Web 2.0 Product or Company, and taking after existing names – Valindo? Valadoo? Valdoowy?
Valindo? Valadoo? Valdoowy?
TagCrunch
By all means, you guys continue to bikeshed. As for me, I’m waiting for an expression of interest by the HTML 5 community.
From my perspective:
All in all, If there is interest and participation, I believe that a reasonably useful validator could be built this fall, and a could be pretty much fully-functional, stable, and maintained by a self-sustaining community by year end.
I’m waiting for an expression of interest by the HTML 5 community.
You might try asking on their mailing list.
OK. I try not to bikeshed. :-)
My upcoming master’s thesis is tentatively titled “A Conformance Checking Service for Web Applications 1.0 Documents”. That is, the goal is to write a conformance checker for HTML5 to the extent the spec is ready at the time I need to wrap up and graduate. (Of course, it would be nice to update the service later when HTML5 is done.)
My HTML5 conformance checker is a special case of my Validation Service for RELAX NG. However, I fully realize (and have realized from the outset) that there is no schema language that can fully describe the conformance requirements of HTML5. The plan is to express everything that is convenient to express in RELAX NG in RELAX NG. Of what is left the plan is to use Schematron for everything for which Schematron is convenient. For the rest, the plan is to use a Turing-complete language—in my case Java. When it makes sense to glue RELAX NG and the Turing-complete language by implementing a datatype library, the plan is to do so.
I didn’t write the schemas from scratch. The person in charge of the schema project is fantasai, who wrote the bulk of “HTML5 Core”. (Note that the modularization choices are not Hixie-endorsed.) I have contributed stuff outside the Core including Web Forms (both 1.0 and 2.0).
The status of the project hasn’t changed since early May, because I got a contract for working on Firefox. However, the contract runs out in a couple of weeks after which I intend to take the thesis work out of the freezer and continue it. It turns out that the WHAT WG work has focused on non-syntax matters over the summer, so the break happened to be well-scheduled.
The known bugs / unimplemented features as of May are documented.
I have used test cases by Anne van Kesteren and fantasai. I’d love to have more test cases.
The architecture of the software is well-suited for supporting HTML 4.01 and XHTML 1.x as well. I just haven’t gotten around to incorporating the HTML5 datatypes in those schemas or adding XHTML+MathML+SVG to the preset list (originally due to legal reasons but later due to being busy doing other things).
My HTML5 conformance checker is a special case of my Validation Service for RELAX NG.
I have been unable to find an automated regression test suite in your source code. Did I miss it?
The existence of such is an absolute prereq for my participation.
I have been unable to find an automated regression test suite in your source code. Did I miss it?
There’s no automated test suite for the front end. There are, however, automated tests for the back end. The code is in the schema CVS repository. (Test driver. Driver for setting up the driver.)
Note that a very large chunk of testing is based on Anne van Kesteren’s Web Forms 2.0 test suite as patched by me and those files aren’t in the CVS repo.
Why think of building a new HTML validator, when there are plenty in development already, and many of them open source? There’s Henri's, mentioned above, there’s the W3C’s, well in need of some love and help from the community it has served for all these years, there’s also relaxed, with some nice technical aspects as well, and quite a few others.
So, is there a need for a new HTML validator? I don’t think so. From my perspective, there is a need for a stronger belief in open source and working together, however rewarding the idea that “I can do better on my own” may be.
however rewarding the idea that “I can do better on my own” may be
Oliver, it generally isn’t a good idea to attribute motives, particularly when you can’t substantiate them.
I’d like there to be an HTML validator that doesn’t ignore content-type headers, and isn’t legacy.
If such can be accomplished without me lifting a finger, so much the better. Otherwise, I don’t plan to simply whinge about it.
Sam: I’m not attributing you motives, just pointing that unless you really want to start from scratch, there are plenty of tools to which you could participate. Whether they fit your criteria and taste and desires is up to you.
Thank you.
I am all for a new validator, using the current tools everyday at work you start to see the short comings.
I am going to start doing some design mock-ups, unless someone else is already covering the design?