It’s just data

Google Base Format Review

I’ve started implementing Feed Validator support for the Google Base Bulk Upload formats, initial support is already online, more is committed and should go online overnight, and work will continue into next week, including better error messages.

I sent an initial set of questions on Thursday night using the Contact Us form, but I’ve not heard back.  So I made a number of assumptions, and have included some more questions below.

Feedback on the Feed Validator can be sent to usual places: bugs, patches, and questions/comments.

Feedback

The documentation contains typos like “numerice”, and punctuation issues like

Accepted values are “starting” or “negotiable;” The default is “starting at.”

Listing_type has lowercase values defined, but the examples not only have a case mismatch, but the name of the element itself has changed. 

For location “Anytown, CA, 12345, USA” is listed as not acceptable, but many of the examples have even less: “Anytown, CA, USA”

Is it event_date_range, event_dateTime or even eventdateTime?

Dates in expiration_date and expiration_date_time are not formatted correctly.

In products-atom.xml, you will find

<g:service>Standard</g:service>
<g:service>Overnight</g:service>

... but definition for service states:

Acceptable values are ‘FedEx’, ‘UPS’, ‘DHL’, ‘Mail’, and ‘Other’

None of the feed examples contain a g:id element.  Will the rss guid, rdf:about, and atom:id attributes be used if this element is not present?

The gender and image_link examples don't contain a namespace prefix.

Wellformedness errors can be found in age, delivery_radius, event_date_range, name_of_item_being_reviewed, and shipping

None of the complex types are valid RDF/XML, and therefore can’t be used in RSS 1.0 — also personals and news are incomplete.  None of the guids in the RSS 2.0 feeds are valid permalinks.  Several of the Atom feeds — in addition to being in the Atom 0.3 format despite the prominent warning — contain a illegal issued element at the feed level, and the documentation refers to item elements in Atom feeds.

Recommendations

People who propose extensions should try to validate them first.


Validating the Google Base RSS extension

Sam Ruby: People who propose extensions should try to validate them first. [link] Randy: Wow, it’s amazing how many errors Sam found. Had they tossed that document at Sam, Danny or...

Excerpt from The RSS Blog at

The format doesn’t seem evil, just surprisingly sloppily defined....

Excerpt from del.icio.us/tag/google at

The Daily Grind 758

And the ultrasound says: it’s going to be a girl! Software Phalanger - PHP language compiler for .NET. (via Gadgetopia) WinFX Nov CTP for use with VS2005 and .NET Framework 2.0 RTM - Brad Abrams has the pointers and installation instructions....

Excerpt from Larkware News at

Sam, I pinged them about the invalid RSS 1.0 issue, they did respond (asking for a ref to the relevant spec - I was tempted to suggest they Google...).

Posted by Danny at

[Comment] Sam Ruby

Step 3.5: validate. As of this weekend, the feedvalidator understands and will validate elements in the Google base namespace....

Excerpt from Niall Kennedy's Weblog: Google Base blog import instructions at

Sam,

The documentation for the g:label element/attribute is incorrect, at least for Atom 0.3 bulk uploads to Reference Articles. The sample doc shows multiple g:lable elements, however you must use a single g:label element with comma-separated tags.

Also, the docs say HTML isn’t allowed in the description attribute value. It’s allowed but isn’t rendered. Thus I’ve stripped the HTML from Blogger Atom.xml files for uploading.

The maximum length of a description attribute value is 10,000 (not 1,000) characters. Overlength content elements won’t post.

More details are at [link]

Posted by Roger Jennings at

Thanks for the detailed feedback Sam,
it will be great to have the Feed Validator for Googlebase feeds!
I’ll make sure we answer the points you’re making in a reasonable time.

Robert Kebertnet Cooper from ROME also had many similar comments when trying to implement a ROME module for GoogleBase.

P@

Posted by Patrick Chanezon at

Googlebase Criticisms

Sam Ruby is doing a thorough review of the Googlebase data formats and he isn’t happy about their feeds: None of the complex types are valid RDF/XML, and therefore can’t be used in RSS 1.0 --also personals and news are incomplete. None of the guids...

Excerpt from Darwinian Web at

it will be great to have the Feed Validator for Googlebase feeds!

Everything is pretty much online by this point.  Take a look at these testcases (mostly from the spec itself, some slightly modified to be well-formed, etc).

I’ll make sure we answer the points you’re making in a reasonable time.

More importantly, feel free to critique and/or contribute to the test suite, documentation, or code.  The goal is to catch as many common errors as possible, and provide as helpful guidance as possible when such errors occur.

Posted by Sam Ruby at

Danny: Send them here.

Posted by Mark at

Linkstipation

Having previously ganked Matt’s Asides for MT, I’ve now just dropped in the original, but I haven’t figured out how to have the front page show ten posts or sections of Shorts, rather than just ten Shorts when I drop in a bunch,...

Excerpt from phil ringnalda at

GoogleBase BulkUploads 和 Atom

Google Base的Bulk Uploads功能就是对添加用户在Gbase上发布项的批处理。把多个项的内容用XML格式(RSS1, RSS2和Atom)定义后直接上传,减少在页面上手动多次编辑的麻烦。 Google定义了在Atom上的Google Base的扩展(有点遗憾的是用了已经过时的Atom 0.3格式,Atom1.0已经出来了啊)。 Upload的方式目前有两种,一是上传Atom文件,二是FTP上传(适合大文件)。似乎有做个桌面发布和同步的应用的可能,有人开始做了么?...

Excerpt from public virtual stream Yining.write() at

feel free to critique and/or contribute to the test suite, documentation, or code.  The goal is to catch as many common errors as possible, and provide as helpful guidance as possible when such errors occur.

Excellent, thanks for the testcases! I will review/contribute. I’ll send Robert your way so that he uses them as well for our ROME Googlebase module unit tests:-)

Posted by Patrick Chanezon at

Mark - lol!

Posted by Danny at

I went through last week and did a Googe Base plug in for ROME. Here if you are interested.

I have to say, there is a lot of stuff in the spec that made me want to scream. Sam also doesn’t mention Google violating their own 5 token comma delimited rules for “location” in their example, but looking at the Google Base production data, it doesn’t seem they enforce that with their software either.

All in all, I have to aggree with, “The format doesn’t seem evil, just surprisingly sloppily defined....” There are scads in inconsistencies between the tag summary page, the tag to tag doc and the schema file, which made implementing a screaming nightmare. There are some good things there, but it really seems like they rushed the documentation out the door.

Posted by Robert kebernet Cooper at

Sam also doesn’t mention Google violating their own 5 token comma delimited rules for “location” in their example

I thought I did.  Did I miss something?

Posted by Sam Ruby at

DOH! Sorry. Must have missed it.

Actually, here is a question about the feed validator I don’t see in your test cases:

The schema file lists several tags that aren’t covered in the doc or the examples from Google. You can see the specifics here.

Are you checking those in your validation?

Posted by Robert kebernet Cooper at

Actually, while I am whining, has anyone got a reason why “price” everywhere is “floatUnit” and yet there is a specific “currency” enumeration at the item level?

Posted by Robert kebernet Cooper at

I pointed them at the RSS 1.0 & RDF specs and suggested one way they could fix things (I think the main bit was @rdf:datatype rather than @type), just heard back:


...
We will work to implement your suggestions in to our RSS 1.0 specs.
...



Posted by Danny at

Daily Friction #36

The hoopla around the Xbox 360 continues and a new Sober worm is circling around the net. Busy times call for short lists, short but sweet. Information first and foremost, a New Sober Worm Spoofs FBI, CIA Spreads Fast. Nine principles of security...

Excerpt from TechMount at

Hi Sam,
I’m looking at ways to follow the RSS 1.0 spec for Googlebase without making a full RDF schema for our datatypes (as I understand Danny’s suggestion). I thought using  rdf:parseType="Literal" would solve the problem (telling the RDF parser: this is xml, if you’re a pure RDF parser forget it).

I did not see anything in the RSS 1.0 spec that would make this non compliant. Moreover the content module seems to allow using rdf:parseType="Literal"

Why does [link] not validate?
It is true that parseType is not defined in the RDF Schema, but it is defined in the RDF spec and this doc validates at the W3C RDF validator (which seems to imply for me that [link] should be amended (your parser seems to rely on it).

Also how is validator.org related to validator.w3c.org?
Are they just running an older version of your soft, or is it something different and less well maintained?
[link]

Posted by Patrick Chanezon at

telling the RDF parser: this is xml, if you’re a pure RDF parser forget it

Putting this data in as a (relatively) opaque blob probably reduces the usefulness of this to RDF consumers, but you are right, nothing in the spec disallows this, so I’ve committed a change to the feed validator to allow it.

Posted by Sam Ruby at

I’ve seen this arrangement before.

Posted by Robert Sayre at

Also how is validator.org related to validator.w3c.org? Are they just running an older version of your soft, or is it something different and less well maintained?

At the moment, it is just a (slightly) older version of FeedValidator, but there is a commitment to maintain it.

Posted by Sam Ruby at

FeedValidator.rb?

This started out as a Random Thought (RT). background The Feed Validator is organized as a recursive descent parser for various feed formats.  It is implemented in an object oriented fashion, where each element ‘knows’ what the possible chi... [more]

Trackback from Sam Ruby

at

Sam Ruby, comment #1

Step 3.5: validate . As of this weekend , the feedvalidator understands and will validate elements in the Google base namespace....

Excerpt from Google Base blog import instructions from Niall Kennedy's Weblog at

Add your comment