FeedValidator.rb?

2005-12-01T20:18:37Z

This started out as a Random Thought (RT).

background

The Feed Validator is organized as a recursive descent parser for various feed formats. It is implemented in an object oriented fashion, where each element ‘knows’ what the possible children are for that element.

This was all well and good when the vocabulary is relatively small and stable. But now we are getting some rather large new extensions being defined. Some even change the validation rules for existing elements.

The problem is that the current design requires each element needs to know all potential child elements that can occur — even from the most obscure and rarely used namespaces.

What would be better is a more modular approach. One where the loading of additional definitions were triggered by the xmlns attribute itself.

Modifying existing classes is impossible in statically compiled languages, like Java. Modifying existing classes is possible in dynamic languages like Python, but difficult enough to be rarely used. Modifying existing classes is trivial and commonplace in Ruby.

listener

The design starts with a SAX2 listener. For prototyping purposes, I started with REXML, but the more I use it, the more I am convinced that it is not a suitable base for building a validator. My current nemesis: SAX character events receive the text data in a partially digested form. But that’s why I chose SAX2, as that permits me to plug in another parser with relative ease.

The Listener's job is pretty easy:

initialize name, stack, and parser
define a default log action of writing to STDERR
start_prefix_mapping looks up the xmlns, and does a require on that name in the module directory. Subsequent calls to require have no effect, which is exactly what we want.
for all other methods, method_missing simply forwards the message to the rules on the top of the stack
start_element calls method_missing and then pushes all child element rules on the stack, and directly executes all attribute rules.
end_element also calls method_missing and then pops the stack.

element

The Element's job is also straightforward:

initialize various stuff
log adds attribute/element/parent name information to the log message and delegate upwards to the parent element
Three “rule” methods do some minor housekeeping
Include the SAX2Listener mixin to define default (null) behavior for all SAX2 events

But the real work is in the Element metaclass, which defines methods for defining rules for attributes and elements, and methods for retrieving these rules.

Several specialized subclasses are defined:

TextElement captures the character value for a given element, useful for elements like title
DataElement extends TextElement, but also throws an error if there is extra whitespace, useful for elements like updated.
Cardinality is stubbed out right now, ultimately it will be used to implement REQUIRED and MANY — the latter will allow multiples of elements like category
DiscriminatedUnion is a fancy name for elements whose definition depends on the value of an attribute. Useful for elements like summary, and amazingly easy to implement.

modules and rules

Modules effectively make use of a domain specific grammar for defining elements, attributes, and their associated validation rules. This is largely declarative, with the ability to seamlessly drop down into code in the instances where it is necessary.

Rules typically involve a regular expression or a table lookup.

While initially, the split between elements and rules seemed to make sense, as implementation has proceeded, this distinction has become increasingly less self evident. Ultimately, it may need to be refactored away.

test

test overrides the logging and comment mechanisms of the listener to check if the test was successful. It also initializes an xml:base value.

ultimately, this would be converted to use Test::Unit. For the moment, I want to stop on first error.

overall

Overall, I’m impressed by how clean and simple a Ruby implementation could be. If I do proceed further with this (at the moment, there probably is only about 20% test coverage), I will definitely need to look into converting to libxml2.

At the moment, there is essentially no UI, but this could easily be provided by Rails. Rails would also make it trivial to add an HTTP Test Suite.