Time for a new HTML Validator?
Ian Hickson: If we truly want to make authors have better tools for making their content more compliant, a start would be having the W3C invest more genuinely in its validators. The W3C HTML Validator is one of the user agents that ignores the Content-Type header when it comes to HTML vs XHTML; filed as bug 1500 about a year ago, still unfixed.
Saying that the validator is somebody else’s problem only works if you don’t believe that the problem is important.
If people feel that HTML 5 deserves a better validator, I’m willing to invest some time into coding. Are there others interesting in contributing to the coding, the writing up of test cases, or the authoring of documentation?
Another thing that will be needed down the road is somebody to host it. I host the Feed Validator and that gets plenty enough traffic as is, I can only imagine what kind of traffic an HTML validator would have.
I believe Ian has written two parsers, so that might be a good start for the validator.
Posted by Robert Sayre at
I’d love to write test cases as well as writing code for the validator itself, if it was open source and contributing was as easy as saying “pie”. I can’t help with the hosting, though, but I feel that the employers as well as W3C member companies of some of this blog’s readers (and even owner) perhaps could do something to help? :-)
Anyway, while speaking of content type and validators, why can’t we have one submission form (e.g. validation front page) for any kind of validation? If the validator can sniff what content type the resource at the end of a given URI has, it can then invoke the correct validator. application/xhtml+xml invokes the XHTML validator, application/atom+xml invokes the Atom validator, text/html invokes the HTML validator, text/css invokes the CSS validator and so on.
Unfortunately, validator.org is already taken by a domain shark, but we might perhaps think of a domain name suitable for this task that isn’t reserved just yet.
Posted by Asbjørn Ulsberg atSounds like a good idea. Henri Sivonen has already made a start at an HTML5 validator, though it may not be quite what you had in mind.
I’d be happy to help in any way I can, which would probably be helping write testcases and documentation, since I don’t know much about parsing HTML.
Posted by Andrew Sidwell atImproving The HTML Validator
Sam Ruby - Time for a new HTML Validator?: If people feel that HTML 5 deserves a better validator, I’m willing to invest some time into [...]...Excerpt from Symphonious at
validatr?we might perhaps think of a domain name suitable for this task
Why do we need vowels at all? vldtr is both short and incredibly cool.
Posted by Asbjørn Ulsberg atWhy do we need vowels at all?
Says the man with the non-ASCII vowel in his name... ;)
Posted by Mark atI can help with any design aspects and xhtml (id love to know how to make a validator, but for now I’m still writing the xhtml) I’d love to take part in a project like this.
Posted by Dustin Senos at
validatr?
In the spirit of How to Name a Web 2.0 Product or Company, and taking after existing names – Valindo? Valadoo? Valdoowy?
Posted by Aristotle Pagaltzis atHow about validator spelled backwards? Now, that’s way cool! rotadilav.com/org/net are available too.
Posted by Asbjørn Ulsberg at
By all means, you guys continue to bikeshed. As for me, I’m waiting for an expression of interest by the HTML 5 community.
From my perspective:
- A markup specification as lengthy as HTML 5 is effectively only a guide without a validator, as few will be able to grok it in its entirety. A validator can be an important part of a feedback loop which will cause users to report areas of the spec that they don’t fully understand or are prone to causing common usage errors. This feedback can be entirely automatic.
- The Feed Validator started out life as an RSS 2.0 validator that also happens to be helpful for RSS 0.91 and RSS 0.92 feeds. To this day, it will report RSS 0.91 feeds which do not contained required item title elements to be valid, and will report RSS 0.92 feeds which contain neither item descriptions nor item titles as invalid. As HTML 5 is destined to be neither a fully compliant SGML grammar nor a fully compliant XML grammar, a parser specifically designed for HTML 5 is in order. The Feed Validator will also do an additional RDF/XML validity check for RSS 1.0 feeds; an HTML validator could do similarly for XHTML.
- I haven’t looked closely at the existing validator, beyond determining that it appears to be legacy code. This coupled with the knowledge that few seem interested in maintaining it leads me to conclude that a new effort is warranted.
All in all, If there is interest and participation, I believe that a reasonably useful validator could be built this fall, and a could be pretty much fully-functional, stable, and maintained by a self-sustaining community by year end.
Posted by Sam Ruby atI’m waiting for an expression of interest by the HTML 5 community.
You might try asking on their mailing list.
Posted by Mark atI was a bit amazed that WebValidator.org was available, so I’ve now registered it if we ever come to the step of putting up a new validator. The domain is currently hosted at DreamHost on a shared host, so I don’t think it will handle the pressure of such a service, but it can be hosted there until anyone else has something better to offer.
Posted by Asbjørn Ulsberg at
Bikeshedding? Come on, we were just having fun. Was any of the propositions serious? Not mine, certainly.
Posted by Aristotle Pagaltzis at
OK. I try not to bikeshed. :-)
My upcoming master’s thesis is tentatively titled “A Conformance Checking Service for Web Applications 1.0 Documents”. That is, the goal is to write a conformance checker for HTML5 to the extent the spec is ready at the time I need to wrap up and graduate. (Of course, it would be nice to update the service later when HTML5 is done.)
My HTML5 conformance checker is a special case of my Validation Service for RELAX NG. However, I fully realize (and have realized from the outset) that there is no schema language that can fully describe the conformance requirements of HTML5. The plan is to express everything that is convenient to express in RELAX NG in RELAX NG. Of what is left the plan is to use Schematron for everything for which Schematron is convenient. For the rest, the plan is to use a Turing-complete language—in my case Java. When it makes sense to glue RELAX NG and the Turing-complete language by implementing a datatype library, the plan is to do so.
I didn’t write the schemas from scratch. The person in charge of the schema project is fantasai, who wrote the bulk of “HTML5 Core”. (Note that the modularization choices are not Hixie-endorsed.) I have contributed stuff outside the Core including Web Forms (both 1.0 and 2.0).
The status of the project hasn’t changed since early May, because I got a contract for working on Firefox. However, the contract runs out in a couple of weeks after which I intend to take the thesis work out of the freezer and continue it. It turns out that the WHAT WG work has focused on non-syntax matters over the summer, so the break happened to be well-scheduled.
The known bugs / unimplemented features as of May are documented.
I have used test cases by Anne van Kesteren and fantasai. I’d love to have more test cases.
The architecture of the software is well-suited for supporting HTML 4.01 and XHTML 1.x as well. I just haven’t gotten around to incorporating the HTML5 datatypes in those schemas or adding XHTML+MathML+SVG to the preset list (originally due to legal reasons but later due to being busy doing other things).
Posted by Henri Sivonen atMy HTML5 conformance checker is a special case of my Validation Service for RELAX NG.
I have been unable to find an automated regression test suite in your source code. Did I miss it?
The existence of such is an absolute prereq for my participation.
Posted by Sam Ruby atI have been unable to find an automated regression test suite in your source code. Did I miss it?
There’s no automated test suite for the front end. There are, however, automated tests for the back end. The code is in the schema CVS repository. (Test driver. Driver for setting up the driver.)
Note that a very large chunk of testing is based on Anne van Kesteren’s Web Forms 2.0 test suite as patched by me and those files aren’t in the CVS repo.
Posted by Henri Sivonen atWhy think of building a new HTML validator, when there are plenty in development already, and many of them open source? There’s Henri's, mentioned above, there’s the W3C’s, well in need of some love and help from the community it has served for all these years, there’s also relaxed, with some nice technical aspects as well, and quite a few others.
So, is there a need for a new HTML validator? I don’t think so. From my perspective, there is a need for a stronger belief in open source and working together, however rewarding the idea that “I can do better on my own” may be.
Posted by olivier athowever rewarding the idea that “I can do better on my own” may be
Oliver, it generally isn’t a good idea to attribute motives, particularly when you can’t substantiate them.
I’d like there to be an HTML validator that doesn’t ignore content-type headers, and isn’t legacy.
If such can be accomplished without me lifting a finger, so much the better. Otherwise, I don’t plan to simply whinge about it.
Posted by Sam Ruby atSam: I’m not attributing you motives, just pointing that unless you really want to start from scratch, there are plenty of tools to which you could participate. Whether they fit your criteria and taste and desires is up to you.
Thank you.
Posted by olivier atI am all for a new validator, using the current tools everyday at work you start to see the short comings.
I am going to start doing some design mock-ups, unless someone else is already covering the design?
Posted by Dustin Senos atW3C Validator may have some bugs but it has got high priority for coding validations and you may see many websites with very poor ranking if they have W3C Validation errors. Its always recommended to get the coding sorted as per errors displayed.
Posted by UK Host at
ValidatrBETA.png)
I think the W3C Validator has a few other bugs that I continue to hit over the years. But I don’t think hosting should be that much of an issue. I’d be surprised if no one came forward. Perhaps the Web Standards Project would have some leads.
I’d be happy to assist with both test cases and documentation authoring. I still have test case files I built for IE5.5 way back when.
Posted by B.K. DeLong at