It’s just data

Aggregator utf-16 tests

I've converted yesterday's utf-8 tests to utf-16 (little-endian, with the appropriate BOM).  For those that want to play along with RSS, there also are RSS 1.0, RSS 2.0, and RSS 2.0 + Atom versions.

Sam: Newzcrawler again passed all tests for Atom and RSS. (Except RSS+Atom, which NC doesn't yet support.)

Posted by Roger Benningfield at

Universal Feed Parser CVS passes these tests.  (3.0fc2 had a bug that failed to recognize that the feed was correctly specified as utf-16.)

Thanks!

Posted by Mark at

NetNewsWire does not recognize any of these feeds.  They all show up as "Untitled source" with no items.

Posted by Mark at

I tested with NetNewsWire 1.0.9b1, the "Atom-enabled" beta.  [link]

Posted by Mark at

FeedDemon 1.10 passes all of these except RSS+Atom, which it doesn't support.

Posted by Nick Bradbury at

PulpFiction 1.0 passes most of these tests but fails #6 and #7.  Same results for all formats.

Posted by Mark at

Shrook 2.0.5:

Atom: fails tests 5 and 9
RSS 1.0: fails tests 5, 7, 8, 9, 11, and 12
RSS 2.0: fails tests 5, 7, 8, 9, 11, and 12
RSS+Atom: does not recognize titles (uses stripped content as title instead)

Posted by Mark at

Radio refuses to subscribe (I only tested the RSS 2 feed).  It displays this error: "Can't subscribe to the channel. The most likely cure is to check the URL in a web browser and see if you can get it to read the feed. The following message probably won't help you figure out what went wrong, but we include it here because it might. Poorly formed XML text, string constant is improperly formatted. (At character #28.)"

Posted by Mark at

Bloglines refuses to subscribe to any of these feeds.  It displays this error: "No feeds were found. Please verify that the website publishes an RSS feed."

Posted by Mark at

There was a bug in the Bloglines encoding conversion routine. Actually, an omission for checking for the UTF-16 BOM. That's been fixed, and this feed now parses. It still suffers from the same two escaping issues that the UTF-8 feed has, however.

Posted by Mark Fletcher at

All the feeds work fine in RSS Bandit except the RSS 2.0 + Atom one since we don't support RSS 2.0 + Atom.

Posted by Dare Obasanjo at

SharpReader 0.9.4.1 scores 100% on all four feeds.

Posted by Robert Lowe at

NewsGator 3.0 passes the first 3, but doesnt't seem to know what to put for the title, so it uses the description.

Posted by Gordon Weakliem at

RSS Reader 1.7 for Mozilla Firefox displays “Iñtërnâtiônàlizætiøn” in all cases. However, I don’t think this constitutes “passing the test” for RSS 1.0 and 2.0. As far as the specs go, there’s no “entity-encoded HTML is allowed” remark in the specs about titles in those formats.

Anyway, those characters from the Latin-1 range are rather tame. :-) I suggest testing astral characters next to see which programs really can deal with all XML characters. Let’s try an ideograph that is supported by the fonts that ship with OS X and see what happens with the comment feeds: 𥄢

Posted by Henri Sivonen at

Ah, missed that.  Henri is correct, entity-encoded HTML is not allowed in RSS 1.0 titles or descriptions (anywhere in the core spec).  The content module's content:encoded is the only place that supports entity-encoded HTML.  (The content module supports inline XHTML too, but no one uses that part.)

Posted by Ken MacLeod at

Issue raised on RSS-DEV.

Posted by Ken MacLeod at

The ideograph came through intact to NNW Lite (Atom beta) as an NCR (RSS 2.0 feed).

Posted by Henri Sivonen at

wouldn't it be a good idea to test some of the multi-byte characters as well?

Posted by Scott Reynen at

Dave Winer: The spec is silent on whether this is allowed, so it must be allowed

These characters are multi-byte in utf-8.  And, yes, this test is, by design, rather tame.

Posted by Sam Ruby at

"These characters are multi-byte in utf-8."

i was wrong.  what i should asked is: wouldn't it be a good idea to test some of the three-byte characters as well?  you have one- and two-byte characters here, but not three-byte.  and each byte number is parsed differently, so an aggregator may parse and display one- and two-byte characters properly, but fail on three-byte characters.  your tests wouldn't catch such an error.

Posted by scott reynen at

Universal Feed Parser 3.0

Universal Feed Parser 3.0 is out. It comes with over 2000 unit tests and over 100 pages of documentation. (211 words)...

Excerpt from dive into mark at

Sam Ruby: Aggregator utf-16 tests

[link]...

Excerpt from del.icio.us/hellsten/utf-16 at

Add your comment