I’ve started implementing
Feed Validator support for
the Google BaseBulk
Upload formats, initial support is already online, more is
committed and should go online overnight, and work will continue
into next week, including better error messages.
I sent an initial set of questions on Thursday night using the
Contact
Us form, but I’ve not heard back. So I made a number of
assumptions, and have included some more questions below.
None of the
complex
types are valid
RDF/XML, and therefore can’t be used in
RSS
1.0 — also
personals and
news are incomplete.
None of the guids in the
RSS 2.0
feeds are valid permalinks. Several of the
Atom
feeds — in addition to being in the
Atom 0.3 format despite the prominent warning — contain a
illegal
issued
element at the feed level, and the documentation refers to
item elements in Atom feeds.
Recommendations
People who propose extensions should try to
validate them first.
Validating the Google Base RSS extension
Sam Ruby: People who propose extensions should try to validate them first. [link] Randy: Wow, it’s amazing how many errors Sam found. Had they tossed that document at Sam, Danny or...
And the ultrasound says: it’s going to be a girl! Software Phalanger - PHP language compiler for .NET. (via Gadgetopia) WinFX Nov CTP for use with VS2005 and .NET Framework 2.0 RTM - Brad Abrams has the pointers and installation instructions....
Sam, I pinged them about the invalid RSS 1.0 issue, they did respond (asking for a ref to the relevant spec - I was tempted to suggest they Google...).
The documentation for the g:label element/attribute is incorrect, at least for Atom 0.3 bulk uploads to Reference Articles. The sample doc shows multiple g:lable elements, however you must use a single g:label element with comma-separated tags.
Also, the docs say HTML isn’t allowed in the description attribute value. It’s allowed but isn’t rendered. Thus I’ve stripped the HTML from Blogger Atom.xml files for uploading.
The maximum length of a description attribute value is 10,000 (not 1,000) characters. Overlength content elements won’t post.
Thanks for the detailed feedback Sam,
it will be great to have the Feed Validator for Googlebase feeds!
I’ll make sure we answer the points you’re making in a reasonable time.
Robert Kebertnet Cooper from ROME also had many similar comments when trying to implement a ROME module for GoogleBase.
Sam Ruby is doing a thorough review of the Googlebase data formats and he isn’t happy about their feeds: None of the complex types are valid RDF/XML, and therefore can’t be used in RSS 1.0 --also personals and news are incomplete. None of the guids...
it will be great to have the Feed Validator for Googlebase feeds!
Everything is pretty much online by this point. Take a look at these testcases (mostly from the spec itself, some slightly modified to be well-formed, etc).
I’ll make sure we answer the points you’re making in a reasonable time.
More importantly, feel free to critique and/or contribute to the test suite, documentation, or code. The goal is to catch as many common errors as possible, and provide as helpful guidance as possible when such errors occur.
Having previously ganked Matt’s Asides for MT, I’ve now just dropped in the original, but I haven’t figured out how to have the front page show ten posts or sections of Shorts, rather than just ten Shorts when I drop in a bunch,...
feel free to critique and/or contribute to the test suite, documentation, or code. The goal is to catch as many common errors as possible, and provide as helpful guidance as possible when such errors occur.
Excellent, thanks for the testcases! I will review/contribute. I’ll send Robert your way so that he uses them as well for our ROME Googlebase module unit tests:-)
I have to say, there is a lot of stuff in the spec that made me want to scream. Sam also doesn’t mention Google violating their own 5 token comma delimited rules for “location” in their example, but looking at the Google Base production data, it doesn’t seem they enforce that with their software either.
All in all, I have to aggree with, “The format doesn’t seem evil, just surprisingly sloppily defined....” There are scads in inconsistencies between the tag summary page, the tag to tag doc and the schema file, which made implementing a screaming nightmare. There are some good things there, but it really seems like they rushed the documentation out the door.
Actually, while I am whining, has anyone got a reason why “price” everywhere is “floatUnit” and yet there is a specific “currency” enumeration at the item level?
I pointed them at the RSS 1.0 & RDF specs and suggested one way they could fix things (I think the main bit was @rdf:datatype rather than @type), just heard back:
...
We will work to implement your suggestions in to our RSS 1.0 specs.
...
The hoopla around the Xbox 360 continues and a new Sober worm is circling around the net. Busy times call for short lists, short but sweet. Information first and foremost, a New Sober Worm Spoofs FBI, CIA Spreads Fast. Nine principles of security...
Hi Sam,
I’m looking at ways to follow the RSS 1.0 spec for Googlebase without making a full RDF schema for our datatypes (as I understand Danny’s suggestion). I thought using rdf:parseType="Literal" would solve the problem (telling the RDF parser: this is xml, if you’re a pure RDF parser forget it).
I did not see anything in the RSS 1.0 spec that would make this non compliant. Moreover the content module seems to allow using rdf:parseType="Literal"
Why does [link] not validate?
It is true that parseType is not defined in the RDF Schema, but it is defined in the RDF spec and this doc validates at the W3C RDF validator (which seems to imply for me that [link] should be amended (your parser seems to rely on it).
Also how is validator.org related to validator.w3c.org?
Are they just running an older version of your soft, or is it something different and less well maintained? [link]
telling the RDF parser: this is xml, if you’re a pure RDF parser forget it
Putting this data in as a (relatively) opaque blob probably reduces the usefulness of this to RDF consumers, but you are right, nothing in the spec disallows this, so I’ve committed a change to the feed validator to allow it.
Also how is validator.org related to validator.w3c.org?
Are they just running an older version of your soft, or is it something different and less well maintained?
At the moment, it is just a (slightly) older version of FeedValidator, but there is a commitment to maintain it.
This started out as a Random Thought (RT). background The Feed Validator is organized as a recursive descent parser for various feed formats. It is implemented in an object oriented fashion, where each element ‘knows’ what the possible chi...
[more]