It’s just data

Blind Spots

Ken MacLeod: The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL), altogether referred to as "the Semantic Web", is the current W3C effort to address that problem.

Count the number of occurrences of the word "Ontology" in this.  For that matter, count the number of references to RDF, and look carefully at the context in which each is made.  Then search Shirky's latest opus for the phrase "Hail Mary" and read the surrounding paragraphs.  (Note to Clay: please make better use of id and/or name attributes in future essays).

I gather that Rothenberg's vision of a semantic web is not much different from Clay's and Kimbro's.  In fact take a specific look at this quote from Rothenberg:

weblogs contain not just hyperlinks, but hyperlinks that are typically surrounded by an amount of referential data about the destination of the link.  Thus, they are not just hyperlinking to a resource; they are making assertions about that resource.

In XML terms, this is mixed content.  Links are unquestionably the greatest source for semantic data within weblogs.  But what does RSS 1.0 (an application of RDF) do with that data?  The answer is that this data, if present at all in the feed, is passed through the equivalent of folding, spindling, and mutilating, and the results are placed in the aptly named content:encoded element.

And in the process made that data completely invisible to logic and equivalency languages, like OWL (nee DAML+OIL).


content:encoded was a late addition to RSS 1.0's mod_content, requested by publishers so they didn't have to do input checking on user input (<sigh>, yes, you heard right, they just slap whatever the user types into the syndication output).  Those same publishers that use content:encoded in RSS will likely also use Atom's mode="escaped" and thus be invisible to XML queries, too.

Something about rocks and glass houses?  :-)

Posted by Ken MacLeod at

Those same publishers that use content:encoded in RSS will likely also use Atom's mode="escaped" and thus be invisible to XML queries, too.

Be careful, that's a testable assertion.

Posted by Sam Ruby at

Nice try Ken, but the rest of mod_content also treats the links (and the rest of the XHTML content) just as opaquely.  The key phrase here is rdf:parseType="Literal".  That means all that beautiful XHTML content is just a big ol' blob of angle brackets, no structure, no transparency.

Posted by Mark at

More to the point, assuming developers interchanged content properly (in literal, unencoded form), is there not a scaling problem as each application has to deal with the unique names and structures of each new XML format they choose to work with?

Did we not, in fact, choose in Atom to "pick a new name" for Dublin Core's "title", so that it was "in our namespace"?  Application developers, let's take Spring and Chandler as examples, each have to write the "import" function for those XML names and structures into their applications so that they can display them consistently with information from other sources.

Doesn't it make sense for there to be an effort to build a common layer for that interchange, so that writing information clients that can deal with multiple sources would be that much easier?  Aren't the clients in our space just beginning to grow in that direction?

Posted by Ken MacLeod at

Question: What namespace is the title element in RSS 1.0?

Titles are cool, and enable such things as month at a glance views, but I would much more often be interested in queries to see what other blog entries reference a page a wrote than to a query to see other blog entries with the same title.

Posted by Sam Ruby at

Mark, What is Opaque?  I'm nowhere near as concerned about well-formed markup in a column in a database or a property value in RDF as I am about the lack of transparent consistency between multiple XML schemas.

You can't find @cite in my RDF.  I can't create a list of titles to display from your XML.

You can loop over all the literals in my RDF to find the @cite attributes.  How am I going to find each of the things that are titles in your XML?

Posted by Ken MacLeod at

RE: Blind Spots

Ken,
I've already blogged about this but here it is again. All you're proposing is that instead of the developers of Chandler or whatever coming up with an XSLT to transform ATOM to their internal format they come up with the RDF/DAML+OIL/OWL equivalencies instead. To a practical developer this is just exchanging one set of complex technologies for another.

At least XSLT has been around for a while and Michael Kay has a good book about it.

Message from Dare Obasanjo at


Sam, you're heading into two different directions with your statements. One, you're saying the greatest source of semantic information on the net occurs through links, by the referential material surrounding the link. But then you just labeled your links in this document with 'this', 'latest opus', 'Kimboro's', and so on. Even with the material surrounding the links included in, not a lot of meaning. Material surrounding links is accidental metadata, not deliberate. Rarely do I see webloggers use effective link surrounding text to good effect.

If so, then most links from warbloggers would semantically be equivalent to "Liberal terrorist left-wing coward".

Your second statement has to do with content encoding the descriptions for RSS 1.0. Yes? What does that have to do the link? Or with semantics, for that matter?

Ken's response leads me to believe this is an old argument with a lot of material not present here -- a semantic problem -- so I may be talking to stuff that I'm reading but which is encoded based on previous conversations. Apologies if I'm off topic.

Posted by Shelley at

Dare, I've trying to get to your article but am having connection problems.

XSLT would be fine.  Can you point me to the project collecting them?  what common model are they using?  how are they solving the structure differences between source model and common model?  what techniques are they using to equate names and structure elements?  does the source of the XML provide that, and hopefully recognize the importance of making it clear and direct, or is it all third-party?  are they having an easier time of doing the mapping than the RDF folks have been having?

Posted by Ken MacLeod at

RE: Blind Spots

Yaaay Shelley,
  I was wondering about this very same thing but decided to hold my response. Even if all the links in a post are extracted, most of the times the link text doesn't stand on itys own without context. The best way to reduce this is to have a large database of links vs. link text and hope the cream floats to the top. At this point you would have reinvented Google.

Message from Dare Obasanjo

at

In response to my point about Atom choosing a wholly unique name for a "title" in Atom, Sam writes:

Question: What namespace is the title element in RSS 1.0?

The difference is that RSS 1.0 assumes a framework that equates names that mean the same thing.  Atom does not.

Posted by Ken MacLeod at

Shelley, queries such as this one, are what I had in mind.  Which, amusingly enough, should now include this blog entry.

And I didn't even have to reinvent Google to make it work.

Posted by Sam Ruby at

Sam,
If all you are interested in is local search then it seems you and the RDF folks are talking at cross purposes. The interesting thing is how to perform searches like the one you described across multiple data sources (i.e. weblogs) not how your home brew implementation works. Your implementation could as easily have been written using SQL and a relational database and besides a bunch of power users (not many considering that I offered to add XPath based search to RSS Bandit and none of my co-workers who use RSS Bandit thought it was interesting and I work on the XML team at MSFT) there wouldn't be anyone who would have reduced functionality.

Posted by Dare Obasanjo at

Sam, how would you extend that query to get the title of each of the "items" that contain that XHTML anchor with an href of "burningbird"?  would that only work on "your XML"?

Posted by Ken MacLeod at

shirky touches off a storm of semantic web posts

Clay’s latest essay, The Semantic Web, Syllogism, and Worldview, has touched of quite the flurry of interesting responses. Mark Pilgrim has a number of these responses collected in his “B-Links” sidebar, but I’m going to put them here, as well, so that I can find them more easily in the future. Shelley Powers’ Deconstructing the Syllogistic Shirky Ken MacLeod’s XML vs. RDF :: N × M vs. N + M (Or, Questioning Why People Can Only See the Semantic Web AI Strawman) Dare Obasanjo’s The Semantic Web and Perpetual Motion Machines Sam Ruby’s Syllogism (particularly the comments thread) and Blind...... [more]

Trackback from mamamusings

at

You mean a matching query of like words as found in the description or some part of the RSS file? Well, you wouldn't have this anyway with any RSS feed if the person only provides excerpts, or no description at all. How does one content encode no text? Easy, it looks like " ". Fast queries.

RSS is syndication, nothing more, nothing less. It allows us to subcribe to sources of updated material, and get titles, dates when updated, and it's nice when an excerpt is provided.

Now if you want to be able to find links from within that source, ala trackback, you'd need new fields and annotate these fields as ('things I linked to in my writing' or 'related material' and ask people to provide this type of information. Then you'd have consistent and semantically accurate understandings of those links as external data relevant to conversation in posting.

If you want categories, such as this one which could be 'semantic web stuff', again, you want to add these as defined fields.

You don't have to use RDF/XML to record these. You can do so in RSS 2.0. Or Atom.

There is a big difference between deliberate metadata, and accidental metadata, and if semantic web relied purely on accidental metadata, then we have it -- it is called Google.

Posted by Shelley at

Actually, following through on Dare's mention of relational databases, putting huge chunks of embedded intelligence into a single all over BLOB in a relational database is a big no-no.

You sit down, talk it over with domain experts, model the data, build the database, and then bask in the praise and rewards of designing and building a well designed relational database, which functions cleanly out of the box because the system is based on a carefully thought out relatioal data model.

Kind of like...uh, oh, no! Not going to say it...

Posted by Shelley at

XML vs. RDF :: N × M vs. N + M

Or, Questioning Why People Can Only See the Semantic Web AI Strawman Clay Shirky criticizes the Semantic Web in his article, The Semantic Web, Syllogism, and Worldview, to which Sam Ruby accurately assesses, "Two parts brilliance, one part...

Excerpt from Ken MacLeod at

Mama Musings - Collection rebuttals to Clay Shirky's Anti-Semantic Web piece

(SOURCE: mamamusings: shirky touches off a storm of semantic web posts )- Nice collection of the discussion on Clay Shirky's Anti-Semantic Web post.... [more]

Trackback from Roland Tanglao's Weblog

at

So help me God, the next time I hear someone compare RDBMS-is-relational and RDF-is-relational, I'm gonna puke on my monitor.  Sure, RDF is a great replacement for your SQL database, as long as you don't mind storing all your data in a single denormalized three-column pivot table and replacing all your queries with recursive left outer self joins.

Posted by Mark at

Mark, I gather that you're addressing this to me. I don't believe I compared relational to RDF, or suggested that RDF is a replacement for relational. I was talking about the benefits of an agreed on theory, model, and implementation when it comes to managing and sharing data.

I just wanted to clarify this before, before we get sidetracked on a misunderstanding. My apologies if I wan't that clear.

Posted by Shelley at

Mark, the triples view of RDF is one of two common views.  The other is the node view of RDF, where all the properties of the same subject form a node or record.

The comparison of RDF to the relational model is not because one can store the triples in a relational database and perform recursive left outer self joins, it is because the RDF nodes (records) can be related to other nodes via the values in their properties.

Posted by Ken MacLeod at

A Semantic Conversation

When Clay Shirky's paper on Semantic Weblogging first came out and I saw the people referencing it, I thought, "Oh boy! Fun conversation!" But that was before I saw that many of the links to Clay's paper were from what are called 'b-links' I believe -- links in side columns that basically have little or no annotation. I guess what a b-link says is that the person found the subject material interesting, but we don't know if they agree or disagree, and it's hard to have a conversation when the only statement a person makes is, "I'm here." What led to this is Sam Ruby continued his discussion about Clay's paper, saying Links are unquestionably the greatest source for semantic data within weblogs. What we see is that even with something that we all know and understand such as the simple link, you can't pull semantics out when none is put into in the first place. Still, not all links were b-links. Tim Bray talks about Semantic Web from the big picture, and references big corporations with big XBRL (Extensible Business Reporting Language) files and all that juicy corporate data found at data.ibm.com and data.microsoft.com. To him, the Semantic Web...... [more]

Trackback from Burningbird

at

Sorry to get sidetracked; the original topic was more interesting anyway.  Yeah, syndication originally was focused on metadata -- headlines, external links, excerpts, and later dates.  But with weblogs it's morphed into something else (at least for some people) -- titles, permalinks, rich inline XHTML content.  RDF is fine for encoding the metadata, but what about the content itself?  RDF treats it as one big blob.  You can argue that all the interesting stuff is in aggregating and querying the metadata, but that's simply not true.  Aggregating and querying the data (content) is also useful, and RDF is an impediment there.

Or perhaps there's simply some sour grapes going on in the RDF camp from advocates who are used to saying "whatever you can do in XML, we can do better; once you gain enlightenment, untold riches await you" now that Sam has shown that there are some things pure XML does better.

Posted by Mark at

I <heart> mixed content.

Posted by Ken MacLeod at

Curious, what HTML content (<body>) elements are inherently interesting?  (As opposed to just being visible in a free-context search, such a free-text search that might look for emphasized text.)

These come to mind: @cite, <a>, @id, @class

Posted by Ken MacLeod at

Mama Musings - Collection of rebuttals to Clay Shirky's Anti-Semantic Web piece

(SOURCE:mamamusings: shirky touches off a storm of semantic web posts)- Nice collection of the discussion on Clay Shirky's Anti-Semantic Web post. QUOTE Mark Pilgrim has a number of these responses collected in his “B-Links” sidebar, but...

Excerpt from Roland Tanglao: HowToDevelopSoftware at

I pull out <cite> and @cite to do posts by citation and posts by quotation, respectively.  But <a> is the certainly the big one because everyone uses it.

Posted by Mark at

<img> also comes to mind.

When SQL was new, SELECT was exciting as it permitted ad hoc queries.

Example: updated entries

Just because something isn't universal doesn't mean that it isn't useful.

Posted by Sam Ruby at

More on RDF, The Semantic Web and Perpetual Motion Machines

My post from yesterday garnered a couple of responses from the RDF crowd who questioned the viability of the approaches I described. Below I take a look at some of their arguments and relate them to practical examples of ...

Pingback from Dare Obasanjo aka Carnage4Life - More on RDF, The Semantic Web and Perpetual Motion Machines

at

Semantic Web aphorisms

from the much-discussed Semantic Web essay In order for the Semantic web to work, you would need "a world where... [more]

Trackback from BookBlog

at

Re: updated entries.  At one point I was considering marking up updated posts with <ins>.  But yeah, basically whatever markup you can think of, you might someday want to query on.  Any system that makes this easier is a form of manufactured serendipity, and is a good thing.  Any system that makes this harder is just putting up barriers to future innovation, and is a bad thing.  For all that RDF advocates scream about untold riches and unspecified future benefits, I would think they would understand this point.

Posted by Mark at

First off, there's nothing stopping an XPath processor going inside RSS 1.0 data. Yes, in RDF terms literal content is a blob, but as has been noted this happens in RDBMS's as well.
content:encoded is a bit of a historical aberration for the reasons already given.

I'm not sure how it can be implied that RSS 1.0 is somehow lacking because of the opacity of literals in RDF - in non-RDF feeds, the whole of the feed is opaque in RDF.

But your point is a good one - links are seriously important bits of information. There definitely does need to be work done on addressing the case where RDF contains literal data that contains links. It's worth remembering that this is something of a special case - there are plenty of XML formats that don't have any linking.

If I link to a document from here, then the data on that remote page isn't immediately processable by anything looking at this page. The same normally applies for resources described in RDF.

But I believe the syndication use case for RDF does call for further work in getting inside the literals. 

Mark, you better get a big sick bag because RDF is a relational model, just not the same one as Codd's. Very rarely is it necessary to look at triples in the form you describe - like SQL DBs, the tools are there to help.

I think Dare (probably unintentionally) hit the nail on the head:
"The interesting thing is how to perform searches like the one you described across multiple data sources (i.e. weblogs)..."
This is where the RDF approach begins to shine, especially if you add data from multiple namespaces to the equation.

Posted by Danny at

Sam's semantic web hootenanny

Sam Ruby's Blind Spots post and the subsequent comments (and trackbacks) are like a point / counter-point semantic web sing-along crab cannon.... [more]

Trackback from the iCite net development blog

at

all my little words

the semweb ate my homework: a monday afternoon's reading. how google beat amazon and ebay to the semantic web - wow the semantic web, syllogism, and worldview - ooh xml vs. rdf - huh the semantic web - neat hyperdictionary: ontology - oh the...

Excerpt from a jeweled platypus at

Mama Musings - Collection of rebuttals to Clay Shirky's Anti-Semantic Web piece

(SOURCE:mamamusings: shirky touches off a storm of semantic web posts)- Nice collection of the discussion on Clay Shirky's Anti-Semantic Web post. QUOTE Mark Pilgrim has a number of these responses collected in his “B-Links” sidebar, but...

Excerpt from Roland Tanglao: KLogs at

Questions et débat sur le web sémantique

Le web sémantique est sûrement une des idées les plus prometteuses du moment mais aussi certainement la plus controversée. Et avec lui, ce sont les technologies crées pour lui par le W3C qui sont critiquées : RDF et OWL notamment. Clay Shirky, a...

Excerpt from Znarf Infos - le carnet web at

I think Dare (probably unintentionally) hit the nail on the head:
"The interesting thing is how to perform searches like the one you described across multiple data sources (i.e. weblogs)..."
This is where the RDF approach begins to shine, especially if you add data from multiple namespaces to the equation.

Posted by Danny at

A Semantic Conversation

Where Shelley talks about the semantic web equivalent of the BLINK tag, and the world gets down and kisses her feet.... [more]

Trackback from Burningbird

at

Mama Musings - Collection of rebuttals to Clay Shirky's Anti-Semantic Web piece

(SOURCE:mamamusings: shirky touches off a storm of semantic web posts)- Nice collection of the discussion on Clay Shirky's Anti-Semantic Web post. QUOTE Mark Pilgrim has a number of these responses collected in his “B-Links” sidebar, but...

Excerpt from Roland Tanglao: XML at

Add your comment