It’s just data

Sifting for Metadata

Elias Torres: Atom undoubtedly will be the format and API of choice for all these content types, but its design was to be the minimal amount of metadata to communicate information and not a rich semantic framework to express it all.

Along the way, Elias notes the irony in that the output of a SPARQL query is not RDF/XML.

I agree that Atom isn’t intended to be a rich semantic framework, much in same way that HTTP was never intended to be a highly advanced distributed object system.  I’ll also note in passing that RDF is multi-faceted.

Enough theory, let’s take something a bit more concrete.  Look at this feed.  If this feed were expressed in RDF/XML, how would you express the following as a SPARQL query?

//atom:entry[//xhtml:a[@href='http://feedvalidator.org/']]

In English, this amounts to finding all entries which contain a link to the Feed Validator.

My theory is that most of the interesting metadata is in the content.  That’s the essence behind impetus towards microformats.  It’s what makes Google work.  It’s what makes Technorati, Feedster, Bloglines and any number of other search engines work.


Sam wanted to know how can we execute that same query in SPARQL. Courtesy of Dan C.

SELECT ?entry 
WHERE { 
     ?entry a atom:entry. 
     ?entry atom:content ?html. 
     FILTER xpath:match(?html, "//xhtml:a[@href='http://feedvalidator.org/') 
}

Note: The server must have the xpath:match extension installed.

I was wrong regarding the XML format for results, since there’s a specification used for SPARQL Results in RDF (not normative) over at  [link]

Also, I remembered that we have the CONSTRUCT keyword which returns graph/RDF results only. This can be used to create custom graphs out of the result bindings.

The main reason for an XML format was because of a use case requirement for the working group.

Posted by Elias at

Sam Ruby said: “My theory is that most of the interesting
metadata is in the content.  That’s the essence behind impetus
towards microformats.  It’s what makes Google work.  It’s what
makes Technorati, Feedster, Bloglines and any number of other
search engines work.”

A bit of circular reasoning ;))) no? If we use for example Piggy Bank as an example, I could say the medata are into the content and the expression of their relationship in the RDF. It’s what makes Piggy Bank works. But that would be the same circular reasoning ;)
[link]

In fact, I don’t think the solutions are opposed but more complementary. It’s good to have implicit metadata inside your content. The emerging of microformats is the demonstration of the needs for explicit metadata in the content (for different purposes, including marketing profiling of users, commercial use of the weblog content, etc.).

GRRDL shows that you can extract these explicit metadata in RDF/XML. It’s just another serialization of the same information.
[link]

One of the problems is when you deal with extensibility of the model, like for example with media enclosures these days, or when you have clashes between different vocabularies in microformats, with different semantics (the way I name my classes are not necessary the same than someone else).

I don’t think it’s about XML against RDF but more using the technology that will solve your issues.

Posted by karl at

GRDDL definitely seems like something to watch.

Posted by Sam Ruby at

Ocean boiling in the age of microformats

Have I mentioned that syndicating microformats is hot. That it’s important? That it’s possibly one of the most important things in syndication? Ever? Well, I have now. Avoiding plain XML and presentational markup from Tantek Çelik is a...

Excerpt from BitWorking at

Sam Ruby: Sifting for Metadata

"興味深いメタデータの大部分はコンテンツの中にある。それが microformatsの推進力"さすが...

Excerpt from del.icio.us/tag/metadata at

SELECT ?links WHERE

<http://sgp.me.uk/sam/atom> dc:related ?links .

The scenario was “If this feed were expressed in RDF/XML...”. There’s no reason the RDF/XML expression should map to the feed node-to-node. If I remember correctly the FOAF Output plugin for WordPress adds <code>dc:related</code> statements for each link found in the content.

“My theory is that most of the interesting metadata is in the content.” - for most HTML and feed data, that’s probably true. But it doesn’t have to be true, and there’s first class data to take into consideration.

The SPARQL results format works very well too, it’s remarkably easy to work with, e.g. as input to XSLT. Oh yeah, and GRDDL is cool.

Posted by Danny at

Sam Ruby: Sifting for Metadata

Sam Ruby: Sifting for Metadata. Not that he was bashing, but I love RDF bashing. Oh, and one of the coolest things to come out of XML development is XPath....

Excerpt from Keith's Weblog at

Sam Ruby: Sifting for Metadata

A collection of Sam Ruby’s posts about metadata and their comments...

Excerpt from del.icio.us/tag/samruby at

Metadata in Context

Sam Ruby writes My theory is that most of the interesting metadata is in the content. Interestingly, he put <b> tags around the word “in,” apparently to emphasize it. There are, as Sam no doubt knows, <em> tags for that...

Excerpt from typewriting tag: semiotics at

Add your comment