intertwingly

It’s just data

OpenSearch Description Validation


Yesterday, DeWitt Clinton IM’ed me that the feed validator is already proving helpful in working out an issue with an OpenSearch description document.  While this is cool, knowing that this code is actually being used motivated me to make a sprint to complete my first pass.  At the present time, there are 101 tests.  The bulk of the code is here and consists of only 120 or so lines as I was able to leverage much of the infrastructure already present in the validator.

Undoubtedly, in my haste, I’ve made mistakes and errors of omission.  The code may flag non-errors.  It may miss real errors.  The messages are undoubtedly not as helpful as they could be, but it is a start.  Feedback is welcome.  However, given that this format involves XML escaping, URL escaping, and MixedCase identifiers, I expect that a validator will prove to be very useful.

Feedback

A few additional pieces of feedback, the first of which I’ve already provided to DeWitt: the spec should be clear that none of the URIs can be relative: this applies not only to Images, but also to the Url templates themselves.

The remaining points are rather obscure and relate to internationalization.  In the grammar for templates: parameter names and prefixes may contain non-reserved characters, and the grammar specifies percent encoding.  Percent encoding applies to bytes, so the character encoding must be specified in this case.  Neither the Input nor Output encoding clearly apply, and each may appear multiple times in any case.  This is an edge case that will rarely be seen, so I don’t see any value in allowing one to specify the encoding of parameter names; I’d suggest simply specifying that tprefix and tlname values be encoded in utf-8 before percent encoding.

Finally, and this is simply a clarification, the searchTerms attribute in the Query element MUST be encoded in the same encoding as the enclosing document, not in one of the InputEncodings specified.

Outlook

I’ve obsessed over minutia, but quite honestly these are mostly either edge cases, or things that can be rapidly and accurately flagged by a validator — in either case, the impact of these concerns is very containable.  The spec is sound.

Having now taken a close look at this specification, I’m rather bullish on it’s future.  In many ways it is RSS’s textInput done right.  The purpose of <textInput> has always been something of a mystery.  The core issue is that the workflow is all wrong.  One needs this information to discover a feed: after you have the feed it is a bit too late.

Ideally, everybody who supports feed autodiscovery in response to user input would also support OpenSearch autodiscovery, at the very least in HTML/XHTML.

This, coupled with standardized feed extensions and/or microformats, could also bootstrap an emerging genre of tools: mashup generators/IDEs.