It’s just data

Pop Quiz #2

What does your aggregator do with this?  Is it valid?

Update: The above was updated to point to a snapshot of this this feed as it existed at the time this blog entry was originally posted.

P.S.  Hint.


NewsMonster says

Error in iBLOGthere4iM


javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: Character reference "
" is an invalid XML character.



Posted by Nick Chalko at

Once again, NewsGator 1.3 seems to subscribe and pull down posts (24 of them) without a problem.  What's supposed to be tripping it up?

Posted by Mark Gardner at

No error in Radio Userland. The entity shows up as a little square.

Posted by Sjoerd Visscher at

Radio 8.0.8 on Windows XP accepts the feed with no complaint. The &x#13; is displayed with the Windows "unknown glyph" box in MSIE and Mozilla Firebird in the Radio aggregator's display.

When handed the feed itself, Mozilla Firebird 0.7 on Windows XP reports:

XML Parsing Error: reference to invalid character number
Location: http:/ /www.intertwingly.net/stories/2004/01/14/iblogthere4im.rss
Line Number 200, Column 126:

When handed the feed itself, MSIE 6.all-the-patches-to-date does not register an error and displays the character with the "unknown glyph" box.

  MSM

Posted by Michael S. Manley at

BottomFeeder reads that just fine.  What's supposed to trip it up?

Message from James Robertson

at

feedvalidator.org complains about "reference to invalid character number": [link]

Posted by Mark at

Shrook doesn't spot the error, because the Apple XML parser doesn't convert entity references itself, and Shrook doesn't check the validity of same when it does the conversion.

Posted by Graham at

Mozilla says:
XML Parsing Error: reference to invalid character number

My ultra orthodox feed parser says:

Illegal XML character 

Both fail to further parse the document.

Posted by Jay Fienberg at

It's hilarious that Userland's Validator says the feed is valid, whereas yours (correctly, I think) says that it's invalid.

Posted by Jacques Distler at

RE: Pop Quiz #2

Sam, RSS Bandit can subscribe to it fine. This is because by default the XML parser in the .NET Framework doesn't do character range checking for performance reasons.

Message from Dare Obasanjo

at

There are no exceptions to Postel's Law.

Julian, re: MSXML conformance.  Here's just one example of its failure to conform to the XML specification: [link] Do not, under any circumstances, "just load it up in IE" to check for XML...

Excerpt from There are no exceptions to Postel's Law. at

Note that those who didn't see IE6 showing a warning in the status line probably need to update to MSXML3SP4 . Can't remember whether it was distributed with IE6 SP1, though.

Julian

Posted by Julian Reschke at

Dare,

if .NET's XML parser by default acts in a non-compliant matter, I'd consider that a serious bug. I think I'd even consider it a bug if it's not the default, unless the API comes with a big, big warning that you're not really using an XML parser when you select that behaviour.

Julian

Posted by Julian Reschke at

Julian,
  Yes, we realize this is incorrect behavior and if we had a time mechine we'd go back and revert the decision. Rest assured the new XML parsers provided in Whidbey will be conformant by default.

Posted by Dare Obasanjo at

Dare,

it's funny that this is coming up right now. What's the point in having a (relatively) new XML API that accepts broken documents that will be rejected by MSXML3 and MSXML4 (and any other conformant parser on the planet)?

Maybe a discussion for xml-dev :-)

Posted by Julian Reschke at

Feed on feeds fails: MagpieRSS: Failed to parse RSS file. (reference to invalid character number at line 200, column 125)

Posted by Claude at

From [link]:

"""Unable to transform data at above URL - perhaps it is not well-formed XML?

Sablotron XSLT transformation error on line 200: XML parser error 14: reference to invalid character number.

You might want to make sure there is actually a syndication feed at the URL, perhaps by checking out the Syndic8.com listing for the feed. It might turn out that there are currently problems with the feed (if it is marked as 'Awaiting Repair'), and as such it should be getting fixed anytime soon."""

PS: Sam, you should teach your speller about XSLT (and perhaps Sablotron as well).

Posted by Morten Frederiksen at

I've added XSLT and Sablotron to the dictionary.

Posted by Sam Ruby at

Awasu subscribes OK. We use Expat.

Posted by Taka at

I'm not involved in development of any XML tools, but I have read a good bit in the XML specs.

 is a reference to the ISO 10646 hex character 13, right?

If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point.

XML 1.0

And (just math here) hex 13 is the same value as decimal 13, right?

So how come the feed validates with "
" but doesn't validate with ""?

Posted by Jeremy Dunck at

Jeremy, from 4.1 Character and Entity References, click on [WFC: Legal Character], and from there click on Char.

Decimal 13 = Hex D
Hex 13 = Decimal 19

#xD is listed as a legal character.  #x13 is not.

Posted by Sam Ruby at

Julian,
  I'm not sure what your point is but you're right that it is offtopic for this thread and is better discussed in an appropriate forum.

Posted by Dare Obasanjo at

MyHeadlines likes it: [link]

Posted by Mike Agar at

Sharpreader read it just fine.  A real XML parser finds (at least one) illegal character reference, so that it is not well-formed xml.

Posted by Tom Passin at

Heh, I wanted to look at the source so I clicked on the link in Safari and got this 96-point ugly typeface saying XML Parsing Error.  OK but a diagnostic or two would be nice.

Posted by Tim Bray at

Dare Obasanjo aka Carnage4Life - Reading and Writing Well-Formed XML in the .NET Framework

A recent spate of discussions about well-formed XML in the context of the ATOM syndication format kicked of by There are no exceptions to Postel's Law post has reminded me that besides using an implementation of the W3C DOM most ...

Pingback from Dare Obasanjo aka Carnage4Life - Reading and Writing Well-Formed XML in the .NET Framework

at

NNTP//RSS eats just about everything, including iBLOGthere4iM.

Posted by Asbjorn Ulsberg at

Coldfusion MX 6.1 throws an error:

"Illegal XML character "

Which means that JournURL's integrated aggregator will barf on it, too.

Posted by Roger Benningfield at

Well.  That was embarrassing.  And for anyone reading, 2+2 != 5.  I'm pretty sure about that one.

Thanks, Sam. :)

Posted by Jeremy Dunck at

NetNewsWire Lite 1.0.8b1 accepts this feed and displays it without errors.

Posted by Mark at

2 + 2 = 5, for large values of 2

Posted by Luke Hutteman at

FeedDemon 1.0 accepts this feed and displays it without errors.

Posted by Mark at

Bloglines accepts this feed and displays it without errors.

Posted by Mark at

Is my weblog well formed?

Can I ever be sure?... [more]

Trackback from Sam Ruby

at

Personality test

Universal Feed Parser 3.0 beta 19 is out. (149 words)...

Excerpt from dive into mark at

Add your comment