It’s just data

RDF questions

Danny Ayers: Of course if Atom was expressed in an RDF-friendly syntax* then FOAF and every other RDF language could be used directly in Atom without having to jump through hoops.

Danny, a few questions.  For the basis of my questions, presume a comparison against RSS 1.0, which unquestionably is an RDF friendly vocabulary.

To me, this gets at the heart of the value proposition for RDF.  What I hear is that it essentially provides extensibility for free.  I'd just like to see this promise expressed in tangible terms.


But isn't one of the things about RDF metadata automated queries? In which case, an RDF client that understands FOAF already knows what a foaf:name is, while it won't understand atom:name. (Although yes, you can probably tell it that they're equivalent. But if they're equivalent, why not use the existing ones in the first place?)

Posted by James Aylett at

James, don't assume a level of RDF sophistication on my part.  I am not aware of automated queries.  Got a pointer?

What I have seen with my own two eyes is tools that can infer an XML schema from instance data, and from that be able to assist you in producing queries.  Of course, when all is said and done, all one has when this is done is the name of the attribute, and the namespace from whence it came, but essentially it seems to me that this is all one has in the RDF case too.

But I'm sure I am missing something.

Posted by Sam Ruby at

Asking about property names is the wrong question.  In the simplest case, yes, one would have to rewrite their queries if the names change.  There's some higher-order support for equating names in RDF, and could be for XML, but that can be a separate question.

Danny makes an innate assumption about what "embed" means in RDF vs. XML.  To an RDFer, the following are equivalent:

  <dc:creator rdf:resource="uri:of-creator" />

and

  <dc:creator rdf:about="uri:of-creator">
  ... info about uri:of-creator
  </dc:creator>

XPath is about "instance" queries, queries within one XML instance, or replicated across many instances.  To XPath, those two XML fragments are wholly unique.  RQL is about graph queries, queries among all the RDF loaded to date.  To RQL, the only question about the above fragments is, "have I loaded that information yet?"  Loading many instances of XML to run XPath queries over doesn't "connect the dots" the way running an RDF query does.

Note: running queries over large data sets from multiple sources (XML or RDF) has a problem: duplicate and possibly conflicting information.  The RDF folks are working on solutions, but it hasn't cropped up so visibly in XML applications.  yet.

Posted by Ken MacLeod at

Ken, how do either of those relate to what appears to be the much more common form that I tend to see in RSS 1.0 instances:

  <dc:creator>name</dc:creator>

Posted by Sam Ruby at

Hi guys, sorry I'm late...

In RDF, that's equivalent to the statement:

[something] dc:creator "name"

The [something] would be an instance of an RDF class - probably an rss:channel here, and the name is a literal.

Queries on this are a bit limited, but you could create queries (in effect) to list the creators of [something], or what has been created by "name".

Let me just see if this escapes ok:

  &lt;rss:channel rdf:about='http://www.intertwingly.net/blog/'&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&lt;dc:creator&gt;rubys@intertwingly.net&lt;/dc:creator&gt;
&nbsp;&nbsp;&nbsp;&nbsp;&lt;foaf:maker&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;foaf:Person&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;foaf:name&gt;Sam Ruby&lt;/foaf:name&gt;
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;foaf:mbox&gt;rubys@intertwingly.net&lt;/foaf:mbox&gt;
 

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;foaf:currentProject&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  &lt;foaf:Project&gt;
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;dc:title&gt;Atom Project&lt;/dc:title&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  &lt;foaf:homepage rdf:resource='http://www.intertwingly.net/wiki/pie/FrontPage'/&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  &lt;/foaf:Project&gt;

  &lt;/foaf:currentProject&gt;

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;/foaf:Person&gt;
&nbsp;&nbsp;&nbsp;&nbsp; &lt;/foaf:maker&gt;

  &lt;/rss:channel&gt;

Posted by Danny at

heheh - sorry, please delete that, try again:

<rss:channel rdf:about='http://www.intertwingly.net/blog/'>
<dc:creator>rubys@intertwingly.net</dc:creator>
<foaf:maker>
  <foaf:Person>
  <foaf:name>Sam Ruby</foaf:name> <foaf:mbox>rubys@intertwingly.net</foaf:mbox>
  <foaf:currentProject>
  <foaf:Project>
  <dc:title>Atom Project</dc:title>
  <foaf:homepage rdf:resource='http://www.intertwingly.net/wiki/pie/FrontPage'/>
  </foaf:Project>
  </foaf:currentProject>
  </foaf:Person>
</foaf:maker>
  </rss:channel>

Posted by Danny at

That is an Issue.  The DC and RDF folks are working on potential solutions.  What happens to older applications, like RSS 1.0, would fall out of that.

The Issue is a matter of data types.  RSS's solution is that the DC module explicitly calls for only literal values for dc:creator.  DC is silent on the issue, leaving it open to interpretation and application.  FOAF supports dc:creator as a literal and creates foaf:maker/foaf:made as taking a resource as a data type.

A PS to my previous entry, one way to see the difference between XML instances and RDF graphs is to visualize taking all the XML instances and combining them into one uber-instance, where an XML query can then combine results from several XML fragments.

Posted by Ken MacLeod at

Ok, bit better. Side note - foaf:maker is used because dc:creator is usually specified as a literal (string) rather a resource (URI).

From this an RDF processor can directly obtain a list of subject, predicate, object triples:

1 http://www.intertwingly.net/blog/
  http://www.w3.org/1999/02/22-rdf-syntax-ns#type
  http://purl.org/rss/1.0/channel

2 http://www.intertwingly.net/blog/
  http://purl.org/dc/elements/1.1/creator
  "rubys@intertwingly.net" 

3 genid:ARP10653
  http://www.w3.org/1999/02/22-rdf-syntax-ns#type
  http://xmlns.com/foaf/0.1/Person

4 genid:ARP10653
  http://xmlns.com/foaf/0.1/name
  "Sam Ruby" 

etc.

The genids are IDs used internally in the processor, to represent blank nodes, i.e. resources with URIs.

The triples can be seen as a simple relational database table (and queried as such).

Where you have the XML structure giving the structure with XPath, here the relational structure is given by the RDF graph you get by connecting the triples together (there are several other ways of expressing the same data in XML).

What you don't see in this syntax is what is added by the RDF model and RDF Schemas. Lots can be inferred. For example, looking at the FOAF schema, it states that foaf:made is an inverse of the foaf:maker property.

So given the triple -

http://www.intertwingly.net/blog/
http://xmlns.com/foaf/0.1/maker
genid:ARP10653

and the RDF schema, the processor can infer

genid:ARP10653
http://xmlns.com/foaf/0.1/made
http://www.intertwingly.net/blog/

So not only have you a database populated with those statements you've made explicitly, you can also add all the statements that follow logically from those statements in combination with everything else you know from the RDF schema (which also happens to be expressed in RDF), and anywhere else. It can be intertwingled ;-)

Posted by Danny at

(sorry Ken - the side note re. dc:creator was due to typing latency ;-)

Posted by Danny at

I've not played much with the (SQL-like) RDF query languages, there are two or three.

Working with RDF programmatically it's not that far from DOM, except the structure is a graph instead of a tree (and the syntax is a red herring). So in Jena you've got methods like

NodeIterator listObjectsOfProperty(Property p)

applied to a Model. If a model contained the stuff above, and you specified property dc:title you'd get an iterator back containing just one literal: "Atom Project".

Posted by Danny at

Back to your initial question: "If I were to switch from atom:name to foaf:name, people would have to update their XPath queries in order to access the data in its new place.  How would RQL handle this?  Wouldn't they also have to update their queries?"

If there was a relationship defined between atom:name and foaf:name, then the data would still be reachable in essentially the same way. I forgot to mention subclasses/subproperties - big source of inference/query capability.

If (say) atom:title was a subproperty of dc:title, and foaf:title was a subproperty of dc:title too, then any queries that would be valid for dc:title (gimme all the titles!) would also apply to the others.

ah, found the RDQL examples. Gimme all the titles would be something like:

SELECT ?a, ?c
WHERE (?a, <dc:title>, ?c)

http://jena.sourceforge.net/RDQL/RDQL_Examples.html

Posted by Danny at

Oops!

The genids are IDs used internally in the processor, to represent blank nodes, i.e. resources with URIs.

That should read "resources without URIs."

Hey, nice place you got here. What time's dinner?

Posted by Danny at

Beyond XPath

Sam Ruby has some RDF questions. Typically I'm too knackeredto give a proper answer right now. But if any RDF...... [more]

Trackback from Raw

at

http://c2.com/cgi/wiki?ForFree

Posted by Mark at

Yes, very good Mark. But it was Sam that used the phrase, rather than anyone advocating RDF.

The extensibility RDF offers doesn't come for free, there will be a small up-front cost to XML language developers for syntax adjustments and of course to get the benefits you'll need to use a different set of tools (assuming of course you're not already using RDF tools). But note that this isn't at the exclusion of your XML kit, you can still use all that. (See:
http://www.markbaker.ca/2002/09/Blog/2003/10/10#2003-10-rdf-and-xml )

But if you're wanting to use a variety of languages in a consistent fashion then RDF is almost certaintly the best option right now. It's very good value.

Let's say you want to add something new (e.g. calendar info?) to your syndication feed. You probably namespace it to keep it separate. With a vanilla XML format you'd have to define how the new elements related to your existing elements. RDF's basic model of resources (using URIs) and the relationships between them means that this part is already done for you.

Note also that when you come to adding yet another extension (e.g. product reviews?) then with plain XML you have to define how the elements relate not only to the core language but also to all other extension.

In other words, using a framework like RDF helps prevent your system from  becoming a http://c2.com/cgi/wiki?BigBallOfMud

Posted by Danny at

What I hear is that it essentially provides extensibility for free.  I'd just like to see this promise expressed in tangible terms.

1. Sam, play with Prolog and Versa for a while:

versa
good prolog book

note: Prolog ~= SQL, which leads us to...

2. Beyond the silly XML syntax, reason number one that people are not sold on RDF is that it's a data model without a query language - by "data model" I mean the somewhat grandiose Model of Data. I say this because every time I talk about RDF I am invariably asked - "what can I do with it?". Answer: not a whole lot without a query language. A data model with a query language has limited utility - consider convincing most people to use the relational data model in a world without SQL.Sure you can do stuff with tools like Jena, but nobody today with an ounce of sense is going to allow their queries against standard format data to be locked into a toolkit, even if that toolkit is open source. It's odd the w3c haven't chartered work for an RDF query language, or some kind of abstraction over the engines.

3. Extensibility today: "we just need to add a new column in the DB", "we just need to create a new XML vocabulary.", "we'll use namespaces". But that's not extensibility, that's stovepiping. Specifically regarding DB queries: of course people have to update their queries. But writing queries is dirt cheap, honestly who cares? If all we had to do was write new queries against new information, that's be great. The problem is new information often means having  to update the database table structures as well. Actually it's much worse than that in the trenches - unifying two databases typically requires professional services. XML + transformations is much better wrt managing integration costs, and we can do much more, cheaply, with XML backed by an RDF infoset. But the benefits won't be realized without a standard/de-facto RDF-QL.

3.

findSentences(S,_,_):-
  rdfSentence(S,P,O),
  rdfSentence(P, owl:sameAs, atom:name).

That's pseudo-Prolog. But it makes the point regarding the RDF proposition. That when you switch to foaf:name, you can describe the relationship  between atom and foaf 'name' without altering the underlying DB structure (triples), or having to write new glue code to establish the relationships, thereby cutting out a large amount of cost you'd normally expect to pay. You do this because the notion of relation is a first class citizens in RDF. That's extensibility.

[Btw, the spellchecker is weird. Couldn't see the wood from the trees to preview properly.]

Posted by Bill de hÓra at

hi...I saw 'query' and got interested. Like several others, I have implemented a QL for RDF (squish, co-created with danbri) based on Guha's initial idea (for rdfdb).

What I hear is that it essentially provides extensibility for free

Extensiblity is cheap in RDF, I'd say; merging is also cheap. So because RDF has a model and uses unique identifiers, you can do very interesting queries over aggregate sources.

I have a very simple example that shows that if you have a foaf file and RDFiCal files (icalendar as RDF), you can ask things like: 'who do I know that is going to this conference?'- http://planb.nicecupoftea.org/archives/000068.html

You could then extend this by asking: find me the projects of the people that I know that are going to this conference', and so on.

Whether these sorts of questions will be answered depends on the vocabularies people decide to use and the ways of identifying people and things. However, RDF does provide ways for vocabularies to be created in a distributed way, and linked to existing vocabs. This means that you can get fast adoption and integration of multiple vocabularies for specific
useful purposes, as and when the idea takes someone. Then interesting applications can be quickly created using RDF QLs.

Hope that makes sense...

Posted by libby at

RE: RDF questions

Danny wrote

Let's say you want to add something new (e.g. calendar info?) to your syndication feed. You probably namespace it to keep it separate. With a vanilla XML format you'd have to define how the new elements related to your existing elements. RDF's basic model of resources (using URIs) and the relationships between them means that this part is already done for you.

As an aggregator author all I can say is talk is cheap. Instead of dealing in hypotheticals why don't you just prove this? There already exists an RDF based syndication format today and there are Open Source aggregators (including RSS Bandit) which you can modify to become learning, autonomous agents if this is truly possible with RDF.

Instead of talking, just show us the code.

Message from Dare Obasanjo at


Dare, agreed.  I've always said, the first RSS 1.0 aggregator that actually uses an RDF library will enable the world...

It still requires the information to be there, though.  RSS 1.0 is just one stream of information (current entries).

Libby's example is a good one.  If either an RSS feed or a FOAF file referred to an event, an aggregator could display a calendar of all events, and the subscribed-to people participating in them.

The difference between XML and RDF is what Bill de hÓra is pointing out: with RDF, the events on the calendar day and the participants in that event are straightforward queries.

Posted by Ken MacLeod at

Not that I want to peddly a proprietary solution (because I don't) but to give you an idea of it in ITQL:

select $creator
  subquery(
  select $type
  from <rss_schemas>
  where $type <http://www.w3.org/2002/07/owl#sameAs>
<http://purl.org/dc/elements/1.1/creator>
  )
from <rss_feeds>
where $creator $type 'rubys@intertwingly.net';

Where <rss_schemas> is a model mapping RSS models to one another and <rss_feeds> are URIs to files, RSS feeds or other models.

As Danny Ayers has shown with RDQL and inferencing it gets a lot less verbose and a lot more interesting.

Posted by Andrew Newman at

Dare, I'm working on a RDF-backed tool (called IdeaGraph) that will do aggregation and a whole lot more, but because of that 'whole lot more' it's taking a while ;-)

But provoked by Sam's question and these comments (and some i/o stuff I need to update anyway) I'm going to spend a few days trying a spike at a bare-bones RDF-backed aggregator.

I'll make loud noises when I've something to report ;-)

Posted by Danny at

Just stumbled on a couple of things Libby had probably forgotten about, querying RSS etc :

http://swordfish.rdfweb.org:8085/rdfquery/squish.html

http://ilrt.org/discovery/2000/11/rss-query/

Posted by Danny at

Say I stored stuff in a separate XML namespace in the calendar proposal, couldnt I just provide inference snippets instead uf full fledgedly using RDF? That is, in the calendar example, use rdf to specify calendar:name is equivalent to dc:author or a subclass of it or similar. (Or subclass, etc)

In this way applications not wanting to use the inference capabilities can avoid the snippet wheereas the RDF parser needs to do more work. Its of-course ambiguous how exactly the XML would be transferred to RDF itself to merge with the inference snippet, but lifting a tree to a graph, once a convention for what to do with attributes is specified is pretty much unambiguous...

In these apps thee RDF inference graph may even be incomplete in the sense that we may not 'type' all branches in the tree, but the advantage would be that this allows each application even to use its own snippet. The disadvantage of that ofcourse is proliferation of inference snippets and typing. But from a market perspective, that may be a good thing.

Posted by Rahul Dave at

Rahul, none of the techniques for making XML inferable seem to have taken off (RDF or no), unless you include Google, but then the accuracy leaves a lot to be desired.

It's not that it's hard, it's just that when you're done, you realize you still have a garbage-in/garbage-out problem or you've reinvented RDF again (or something like it).

Posted by Ken MacLeod at

Ken, can you express that in tangible terms?  Looking at a typical RSS 1.0 feed or a FOAF file, I see the process of doing a merge between documents relying heavily on joins on non URL values.

Posted by Sam Ruby at

Doing joins on non-URI/non-key fields isn't a flaw, just as it isn't in relational databases, either, although there may be other issues to consider.

One of Mark's b-links, The Rise of Relational Databases has a description I find interesting:

Both the IBM and the Codasyl products were sometimes called navigational databases because they required the user to program or navigate around a data set. [...] Codd found existing and new database technologies "taking the old-line view that the burden of finding information should be placed on users. . . . [In this view, the database management system] should only recognize simple commands and it would be up to the users to put together appropriate commands for finding what was needed."  Codd laid out a new way to organize and access data. What Codd called the "relational model"

XML is the navigational database where RDF is the relational model.  This is because for any particular "record" of information, common XML schemas each have unique, local structure that must be navigated to access the fields of the record.

Rahul's suggestion is therefore a common one: develop a mapping technique then get everyone to use it.

Thus, my comment.

Posted by Ken MacLeod at

XML is the navigational database where RDF is the relational model

That's a might simplistic, IMHO.  With XQuery, I can do joins, with XPath expressions serving as the moral equivalent of column names.  In either case, you seem equally as exposed to the "garbage in/garbage out" problem.  (Example: how many different ways is Les Orchard referred to on my weblog?).

I might be missing something obvious, but the mapping of dc:creator to foaf:name seems to me to be an example of develop a mapping technique then get everyone to use it.

Trust me, I am really trying to listen here.  Here's what I have gathered so far:

Feel free to correct either or both of these, and/or to propose new differences that I don't yet see.

Posted by Sam Ruby at

RE: RDF questions

Sam wrote


RDF encourages subjects and objects to be expressed as URIs, even opaque ones.  This ensures uniqueness, and therefore collisions are unlikely to be an accident and therefore convey real information.

This is actually a bogus claim by the RDF folks and the cause of much discussion on the WWW-TAG mailing list. A simple proof is to ask yourself whether an RDF statement about the URI "http://www.25hoursaday.com/weblog" is about me, about my weblog or about the HTML returned by doing an HTTP GET on that URI. Depending on the context it could be all 3.

URIs in RDF are just like words, words have name collisions and tend not to have unique meaning by themselves but instead depend on context. How do you pronounce "read"? What is a "tie" ? What exactly is a "boot" or a "trunk" for that matter? Without seeing those words in context you get the wrong answer. URIs in RDF are just like that.

Message from Dare Obasanjo at


Let's start by stipulating some commonalities:

Those commonalities stipulated, the biggest difference between the XML model and the RDF model is their structure.

The statement, "XPath expressions serving as the moral equivalent of column names", is apt.  In RDF, the predicate, a single name, is the equivalent of a column name.  In XML, an XPath (possibly several names positionally related to each other) is the equivalent of a column name.  In RDF, the subject URI is the equivalent of a "row id" or "primary key" in an RDB.  XML has no equivalent to a row-id or primary key currently specified as such.  In RDF, you "know" when you have a "record", all the triples with the same subject are a record.  In XML, what constitutes a "record" is application dependent (in the XML sense of "application").

When I used the term "mapping", I was referring primarily to the structure of the models; data types and data definitions can be more or less common between the two.

To me, the purpose behind mapping anything to RDF is to take advantage of RDF's similarities to the relational model over larger data sets, with the modern bonus of column and record type polymorphism.  (Inferencing is something I see layered over those.)

When someone says, "we can map XML to RDF", what I hear is that, given the right annotation of the XML, all of the benefits of the relational model over larger data sets (with or without the polymorphism bonus) is available directly to XML.

(Note: I don't place any special relevance or favor on the relational data model, or even RDF.  Any good model would do.)

Posted by Ken MacLeod at

Dare, saying that's "bogus" is a pretty strong claim too ;-)

There's no cause for a URI to be ambiguous, particularly where human languages tend to be (word definitions).

Whether a URI can "represent" a physical object without "being" that physical object, or a URI can represent a logical resource, separate from the bits that come across the wire when one dereferences that URI, are both valid concerns.  I suspect they'll get it sorted out as soon as enough of the concerned folks agree on what the concern actually is.

Posted by Ken MacLeod at

Atom and RDF?

At the beginning of October it was finally decided to call it Atom.

And now I want to know about whether it’s in RDF format (or very close) or not.

... [more]

Trackback from phil.wilson at

XML, RDF, Topic Maps, and naming things with URIs

Mark Baker posted a very small, very simple example of why you get more from RDF for cheap than you do from XML. Both links from the comments at Sam Ruby's weblog, an informative dialog from some knowledgeable people. It......

Excerpt from MishMash at

Add your comment