Ruby HTML5 Parser
I got enough of this running to demonstrate proof of concept:
require 'open-uri'
require 'html5lib/html5parser'
uri = 'http://www.whatwg.org/'
doc = HTML5lib::HTMLParser.parse(open(uri))
doc.elements.each('//p[@class="what-to-do"]/a') {|link|
link.elements.each('em') {|title| print title.children}
puts ":\t#{link.attribute('href')}"
}
REXML is used for the TreeBuilder
I’m looking for help. Interested? Join the group.
Update: First patch