It’s just data

Testing FeedTools Dynamically

In the spirit of this, and based on this from the Universal Feed Parser, I created this for FeedTools.

The tests generated by xmlfile_test.rb depend on a simple mock cache of files, served locally.  No matter how fast your network connection is, your local hard drive is faster.  FeedTools should be able to comfortably run a test suite the size of Feed Parser’s in less time that it currently takes to run its existing test suite.

With these two files in place, FeedTools can directly make use of the vast suite of Feed Parser tests.  In fact, these tests already pass.

My hope is twofold: to accelerate the completion of FeedTools by dramatically increasing the test case coverage, and secondarily to spark a discussion as to what a common API should look like, if for no other reason than to enable FeedTools to leverage the excellent documentation provided with Feed Parser.

Patch submitted.

Please note that Universal Feed Parser 4.0 (currently pre-alpha, available via CVS only) fully supports Atom 1.0 feeds.  The test cases are checked into CVS, but the corresponding changes to the parsed data structure are not documented anywhere yet.  Here are the changes so far, subject to my whims as I continue working on it:

I welcome feedback on my choice of element mappings, except dates, which I’m sick to death of discussing.

I also look forward to seeing a Ruby feed parser that supports EBCDIC.

Posted by Mark at

Sam Ruby: Testing FeedTools Dynamically


Excerpt from at


Mark Pilgrim: “I actually have a function somewhere called ‘itsAnHrefDamnIt’ which maps uri, url, and other attribute names to href.”......

Excerpt from at

Nice.  When I first started writing FeedTools, I was still a ruby nuby, and initially, I had wanted to do something like this.  But back then, I was still under the false impression that it would be an either/or proposition — either I used Mark’s tests or I had the flexibility of using my own.  And since the scope of what I was trying to do with FeedTools was a bit bigger than UFP, I went with my own, and copied in a lot of Mark’s tests.  In retrospect, that seems particularly silly that I didn’t even bother to investigate the concept.  But anyways.

That’s also the reason that Mark’s RSS tests tend to pass.  Those are the ones I had the time to copy over.

The other issue was with the bozo bit.  In several of the FeedTools versions, it was present, but I ended up removing it because my parser basically shoots itself in the foot in terms of being able to determine if a feed is valid or not.  And besides, I’m not really sure validation ought to be a concern of the parser.  It seems to me that dedicated validators do a better job, and most of the time, it’d be useless overhead for a liberal parser.  Obviously, the fact that it’s missing causes problems for Mark’s tests.

I saw the solution you came up with for dealing with differences between UFP and FeedTools, and for transforming from Python to Ruby, but I’m not 100% certain that I like it.  More of a stylistic thing than anything else, but... honestly, I can’t help but wonder if trying to share the exact same unit test xml files between UFP and FeedTools might be a mistake.  Perhaps it might be a better idea to just transform the comments in the xml files and automate that process?

Posted by Bob Aman at

Subsetting or supersetting UFP’s requirements are both valid things to do.  But there is a lot of valuable experience behind each and every one of those test cases.  Something that should not easily be dismissed.

For that reason, I do believe that there is value in sharing a large subset of the test suite between the various tools.

Perhaps we can come up a with a more declarative and language independent grammar for expressing the bulk of the tests, possibly with a fallback syntax for the small subset of tests which require more specialized handling.

Posted by Sam Ruby at

Mozilla Thunderbird’s feed-parser would also like to share a test suite. I’ve got the parsing code running with, the unit test script used to test Mozilla’s JS engine. This means I can run large numbers of tests from the command line, using XPCShell.

I asked Mark about this way back when, but the thing was so buggy I got caught fixing bugs without writing tests (I know, I know... lesson learned).

Posted by Robert Sayre at


2930 tests, 2570 assertions, 1803 failures, 759 errors

How depressing.  Oh well, here goes.

Posted by Bob Aman at

How depressing.

Don’t get too discouraged.  As an example of one of the class of errors that I found (and ignored for the moment): in FeedTools, author is a hash containing email, url, name and the like.  In UFP, author is a string.  However, author_detail is provided which contains the more granular data.

Determining how to reconcile (or failing that, map) these two approaches is key to further progress.

Posted by Sam Ruby at

And besides, I’m not really sure validation ought to be a concern of the parser.

I assume by “validation” you mean “well-formedness” — UFP does no DTD or schema validation of any format.

On the subject of well-formedness, I was eventually convinced by interested parties (hi Tim!) that well-formedness is a concern of the parser.  Some applications built on UFP may wish to reject feeds that are not well-formed.  To support such masochistic party-poopers, the bozo bit was born and has been meticulously maintained ever since.

However, in the process of adding support for the bozo bit in UFP 3.0, I inadvertently stumbled into a rat’s nest of character encoding issues centering around RFC 3023.  Briefly: in order to maximize the irony inherent in a feed parser that is simultaneously the world’s most ultraliberal and the world’s most draconian, UFP supports RFC 3023, which specifies the precedence rules for determining the character encoding of an XML document served over HTTP (which would be, like, all of them, at least as far as syndicated feeds are concerned).  Some people (hi Tim!) feel that an XML document is a self-contained bag of bits that specifies its own character encoding, regardless of the enclosing transport.  To support such delusional thinking — which, by the way, has always been completely and utterly unsupported by the XML specification that said people co-authored, and is now in fact explicitly contradicted by the latest version of said specification — I have classified three of the exceptions captured in the bozo_exception field — namely CharacterEncodingOverride, CharacterEncodingUnknown, and NonXMLContentType — as subclasses of an abstract exception class named, appropriately enough, ThingsNobodyCaresAboutButMe.

These were my design goals.  Yours may be different.

Posted by Mark at

Some applications built on UFP may wish to reject feeds that are not well-formed.

Others may simply want to display a visual indicator, like iCab does.

Posted by Sam Ruby at

I assume by “validation” you mean “well-formedness” — UFP does no DTD or schema validation of any format.

Yeah, that’s what I meant.

Others may simply want to display a visual indicator, like iCab does.

True, though in general I’d contend that nothing user-facing should be displaying stuff like that, unless the intended audience are developers.  And if you need that functionality, I’d suggest using a validator designed specific for the purpose.

(We’ll pretend that there actually is a good Ruby feed validator.)

Posted by Bob Aman at

I also look forward to seeing a Ruby feed parser that supports EBCDIC.

It definitely looks possible.

require 'open-uri'
require 'iconv'

feed = ''
converter ='utf-8', 'ebcdic-cp-be')

puts converter.iconv(open(feed).read)
Posted by Sam Ruby at

Feed Parser: Universal Feed Parser Tests

Inspired by Sam Ruby’s work on applying the Universal Feed Parser tests to the Ruby FeedTools, I’ve spent a little time this afternoon working on testing XML_Feed_Parser with that same test suite. There’s a lot of work to do!...

Excerpt from a work on process at

It definitely looks possible.

Yay for Iconv.  Now that FeedTools 0.2.17 is finished and out there, I should have more time to start working on the encoding issues.

Posted by Bob Aman at

While you were out

I managed to recover my laptop battery within two weeks of good charging practices and despite its old age. It’s an ASUS, just in case you were interested, and it’s been serving me well since the last quarter of 2001 when I bought it. As...

Excerpt from The Long Dark Tea-time of the Blog at

Ruby Is Acceptable

Eric Kidd makes a convincing case for Ruby as a Lisp/Dylan substitute. One quote bothered me, though: "If you need...... [more]

Trackback from


Atom and Wiki Driven Testing

Its been a long standing todo to port Mark’s FeedParser tests to work against Magpie, possibly with an intermediate representation to allow cross-language testing. (has any work been down on capturing unit tests/acceptance tests in XML?) Sam’s...

Excerpt from Laughing Meme at

Anyways, I’m finally implementing some of the encoding stuffs, but ebcdic seems to be problematic:

converter ='ebcdic-cp-be', 'utf-8')
Errno::EINVAL: Invalid argument - iconv("ebcdic-cp-be", "utf-8")
        from (irb):15:in `initialize'
        from (irb):15:in `new'
        from (irb):15
        from :0
iconv -l | grep "EBCDIC"
=> shows nothing

This is on OS X... any idea why ebcdic wouldn’t show up and/or how I might rectify that situation?

Posted by Bob Aman at

Pirate Testing (Because Only Ninjas Write Unit Tests)

I’ve got a new favorite development technique, “pirate testing”. I’ve used it on 3 recent projects, and it rocks. And while Sam might have meant it literally, I’ve found it perfectly describes the practice of shanghaiing another tool’s test suite...

Excerpt from Laughing Meme at

Magpie Unit tests

Since people were waving tasks around several months ago and I finally got tired of atom feeds showing up incorrectly in Gregarius, I decided to port some of Feedparser tests to Magpie. I created a rudimentary ajax-ified unit testing harness for...

Excerpt from Test Blog at

Sam Ruby: Testing FeedTools Dynamically


Excerpt from at

On rFeedParser

This post is huge but I have not the time to make it smaller. I’m so very tired. A Quick Introduction rFeedParser is a RSS/Atom feed parser. It is a translation of Mark Pilgrim’s feedparser from Python to Ruby. It behaves almost exactly...

Excerpt from Something Similar at

Add your comment