It’s just data

Using XPath to mine XHTML

Simon Willison: This morning, I finally decided to install libxml2 and see what all the fuss was about, in particular with respect to XPath. What followed is best described as an enlightening experience.

Minding XHTML with XPath/XSLT works great. However, I have yet to find anything freely available that turns any HTML into XHTML. HtmlTidy chokes on all kinds of things.

Posted by Chris Sells at

What one could do, is take MSIE's HTMLDOM, loop through available elements with JavaScript and add them (with the right attributes and content) to a XMLDOM (with MSXML). It should be fairly easy to write a recursive function that does this.

The application would be web based and require MSIE, but besides that, it would be «free». Anything MSIE understands, this conversion application will understand too.

Posted by Asbjørn Ulsberg at

Add your comment