It’s just data

Quantifying the "RDF tax"

There have been assertions of an "RDF Tax".  Not having an opinion on the subject, I decided to do a little investigating.  In particular, I sought to identify the highest potential "ceiling" to the RDF tax.

So, with the help of a number of people on IRC (in particular, Mark Pilgrim, Ken MacLeod, Shelley Powers, and Sean Palmer), I developed an XSLT transform from the Atom 0.2 snapshot into the most comprehensive RDF equivalent.  Some argued for me to simplify in the process... but the methodology I wanted to apply here was first to seek to understand, and only then to simplify.

The initial results are that the maximal Atom 0.2 snapshot balloons from 47 non-blank, non-comment lines to a whopping 61 lines.  And that is before simplification.  Clearly, some readibility was lost in the process.  This, too, needs to be addressed.

A number of observations:

Once we have a sufficient understanding of the "proper" way to apply RDF, then we can move on to exploring the " practical" way to apply RDF.


Alternatively to using foaf terminology, which would then apply a 'foaf tax', it might be better to supply a rdf snippet asserting
equivalence to a canonical atom rdf..

Posted by Rahul Dave at

Sam,
I'm still waiting for anyone to give a good reason for Atom being RDF compatible besides buzzword compliance. Since you seem interested in making this happen can you point out the concrete benefits of doing this that don't contain the phrase "Semantic Web"?

PS: What was hard about writing the transform. The XSLT looks fairly straightforward so I assume the difficulty was in understanding the various RDF vocabularies such as FOAF, Dublin Core, etc.

Posted by Dare Obasanjo at

Marvellous work, though "Quantifying the XSLT Tax" might have been a more accurate title ;-)

I don't think the 25% increase is anything to worry overmuchabout - I'm sure it can be brought down significantly (I notice an rdf:Description in there for a start). Some human readability might have been lost in the process, but a lot of machine-readability has effectively been gained.

Also I don't think your experiences here have very much to do with the "RDF Tax". If you had been using RDF from the start, I'm sure things would have been considerably less taxing - XSLT is hard! It's also worth remembering that a transform like this only has to be written once.
But RDF can certainly be hard work, and there can be proportionately a fair bit of extra work needed for the simpler apps. The benefit is that once the initial bits have been done, the data becomes immediately available for loads of other tools - the tax rebate.

FOAF terminology would be great, as long as the definitions were the same - what is the foaf tax, if the definitions are the same?

Posted by Danny at

Sean Palmer -- that's who that was. Makes sense.

Good approach Sam. Will be interesting to follow this from more than just a Pie/Echo/Atom perspective.

I'm sorry I had to back out early, but I had to finish the taxes (wince -- owe -- wince -- big time -- wince). And I'm not much for IRC, the pace is too fast. Still, interesting experience.

Dare -- if we filter based on "Semantic Web", then we might as filter on single use XML vocabularies. Each of us come into this with different interests. It's important to respect that where you're interested in single use XML vocabulary, I'm interested in, SW. They key is to find middle ground, not exclude or filter based on interest.

I think that's what Sam is doing with this effort -- see how difficult it is to transform, seeing how complicated or simple RDF has to be, what's the value of using FOAF over DC, etc. Most importantly, he caught that there were a lot of assumptions about data and behavior based on the feed originally being pure XML. We need to separate that behavior and look at it individually outside the syntax. Not depend on what I still call accidental grouping because of syntactic placement.

A lot of words, and I used semantic web. Boy, I bet you hate my new weblog, Semantic Web of Poets grin

BTW, Sam, I love your proper vs practical thing bigger grin

Posted by Shelley at

Since RDF seems to be all the latest craze: Why?  What kind of applications will people be able to produce that they wouldn't be able to produce with the lovely simplicity of the Atom 0.2 snapshot? 

Human readability is underestimated. If this atrocius RDF syntax is chosen, you are going to scare off relatively non-technical people from producing an Atom feed. You can argue from here 'till eternity that "It's just applying a transformation with XSLT", but remember that many (most) people even have problems understanding a concept as simple as CSS.

Unless the Atom syndication format is going to lie in the same forgotten pool of mud that W3C RSS is, I believe RDF is best forgotten.

Posted by Arve Bersvendsen at

Shelley,
  If I am being forced to learn RDF (Yes, I am being forced to learn RDF because I am an aggregator writer and blog developer who will use Pie/Echo/Atom/Whatever) then I'd like a good reason for it. The pipe dream that is the Semantic Web does not count as a good reason.

  If there isn't one then I'll join those that vote for adding a canonical transform of the core Atom syntax to RDF to the Atom spec as opposed to baking it into core.

Posted by Dare Obasanjo at

I'm with Dare on this one - I'm not sure what the benefit is here.

And the syntax is arguably more "complex-looking", which like it or not, will hurt adoption.  Look at RSS - RSS 2.0 looks "nice and easy", and RSS 1.0 looks pretty complicated at first glance.  And when you see new feeds coming online, generated by custom systems, what are they using?  Usually 2.0. 

Simplicity matters.

Posted by Greg Reinacker at

Just to clarify a point, in case it wasn't clear.... this was an experiment.  Only an experiment.

It is my expectation that what we will end up will require only a small delta from Atom 0.2.  Aaron showed that a small delta was all that was needed.  This lead to wild assertions that there were probably nine hundred and ninety nine steps to go.  Today, we've reduced that estimate by an order of magnitude, if not more.  Hopefully over the next few days, we will do that again.  If we can do that one more time, we've really achieved something.

Will we get to the point where Atom is transform free to pure RDF?  I don't know.  I certainly would not be willing to give up on the "view source simplicity" that Atom currently has in order to solve the last mile problem.  But I do know that we can reduce the impedence mismatch.  And that Atom will become better specified in the effort to do so.  And hopefully more compatible with a number of vocabularies.  Not just in the RDF sense, but in the XQuery sense too.

Is that worthwhile?  IMHO, yes.

Posted by Sam Ruby at

Good points Sam.

Dare, and other non-RDF supports: I try and 'prove' RDF is a viable option for more than just Pie/Echo/Atom on a relatively frequent basis -- without 'forcing' it on people. I'll continue in my weblogs and other writing, but I'm reluctant to debate this issue here because me thinks it may not make a different what I say. Is it a possibility I have a good read on this?

Sam, I noticed that Sean talked you into List for contributors. And my tax thing was purely accidental -- no pun intended.

Posted by Shelley at

Grrr. Must. Preview. First. I meant to say, "...it may not make a difference what I say..."

Posted by Shelley at

Dare Obasanjo

Shelley,
  I am fairly open minded on technologies. If you can prove how RDF provides benefit to consumers and producers of ATOM feeds then I'm game. If all it does is bring ATOM closer to being part of the mythical Semantic Web that people have been hyping for the past few years with no concrete benefits then I'd rather it didn't complicate ATOM since thanks to the beauty of XML and XSLT, ATOM as RDF is a transform away.

Message from Dare Obasanjo at

Sam,


This lead to wild assertions that there were probably nine hundred and ninety nine steps to go.


I am being quoted out of context. My point above is related to this statement


I asked that whatever
technique you come up with for mixing namespace
vocabularies should be able to solve the problem of
automatically converting an XSLT processor to an EXSLT
processor since this is "simply" a case of mixing
namespace vocabularies. I specifically stated that I
don't believe this problem can be feasibly solved and
neither do I believe RDF is a stepping stone to
solving this problem.


where I state that using RDF is 999 steps away from being able to solve the arbitrary XML vocabulary mixing problem in the general case and has nothing to do with how hard or easy it is to make ATOM become RDF compliant.

Posted by Dare Obasanjo at

Marvellous work, though "Quantifying the XSLT Tax" might have been a more accurate title ;-)

I don't think the 25% increase is anything to worry overmuchabout - I'm sure it can be brought down significantly (I notice an rdf:Description in there for a start). Some human readability might have been lost in the process, but a lot of machine-readability has effectively been gained.

Also I don't think your experiences here have very much to do with the "RDF Tax". If you had been using RDF from the start, I'm sure things would have been considerably less taxing - XSLT is hard! It's also worth remembering that a transform like this only has to be written once.
But RDF can certainly be hard work, and there can be proportionately a fair bit of extra work needed for the simpler apps. The benefit is that once the initial bits have been done, the data becomes immediately available for loads of other tools - the tax rebate.

FOAF terminology would be great, as long as the definitions were the same - what is the foaf tax, if the definitions are the same?

Posted by Danny at

(Sorry Sam, remind me not to do a browser refresh late in the day, forgetting there's still junk in the edit box...)

I reckon what's being done here is right on. It should certainly be possible for Atom feeds to be used directly as XML, without using the full RDF/XML syntax (or RDF model). But it should also be possible to make it easy for people that do use RDF to use the feeds easily as well. The steering looks on course for achieving both of these with only minor compromises either way. Fingers crossed ;-)

Posted by Danny at

Seems to me there's a lot of room for simplification.  Most of the blowup seems to be from dc and foaf stuff, if you're counting lines.  If you start counting attributes, that's different.

Here are some alternate statistics, FWIW. The stats from running the two files through Xerces' SAXCounter yield:

atom02maximal.rdf: 297 ms (48 elems, 18 attrs, 0 spaces, 992 chars) - note that chars here is character data, not the size of the file

atom02maximal.xml: 266 ms (41 elems, 6 attrs, 0 spaces, 1382 chars)

Posted by Ted Leung at

XML to RDF transformation

Sjoerd Visscher: Sam Ruby made an XSLT tranformation from Atom to RDF. He said it was hard to do. On the #echo IRC channel I said that XR would probably make it a lot easier. Then Sam asked me to make an XR transformation that would do the same ...

Pingback from Sam Ruby: XML to RDF transformation

at

Sam, nice of you to do this.  BTW, you would have had much easier time if you used XSLT to 'extract' RDF triples instead of 'transforming' Atom into RDF format.

I also appreciate Sjoerd doing the same in XR, I feel that XR was the right idea implemented wrong in the same way as I outlined above. 

Chopping a tree into a stack of firewood is much easier and end product more useful than trying to build a tree out of firewoods.

Posted by Don Park at

I don't know if this will be any use as an example of coping with RDF in XSLT, but I did a FOAF-to-XML and vice-versa transform a few weeks ago:

http://simonstl.com/projects/foaf-xml/

On a grander scale, Norm Walsh's RDF Twig work is definitely worth exploring:

http://rdftwig.sourceforge.net/

I think Norm's approach is definitely capable of dealing with more RDF variations, though I don't think those variations are something you necessarily want to support in Atom.

Posted by Simon St.Laurent at

Dare, I kind of look at it this way -- I can either spend my time writing articles and tutorials about the technology and working on a couple of unfinished applications, which I think might be useful and perhaps even a little 'cool', all built using RDF. Or I can spend time with the Pie/Echo/Atom talking folks about why the group should consider using RDF. I think my time is better spent on the more productive option, don't you?

When Sam starts simplifying that proper model into something practical, I'll help. And answer specific questions. Or provide specific examples. But I'm not going to spend my time answering questions that begin with "Shelley, prove to me..."

Posted by Shelley at

Considering XMI

The recent considerations about RDF leave me wondering if we could consider XMI as a viable alternative to the oftentimes bloated and overly complex RDF syntax. XMI documents, done properly, are only mildly more more complex than vanilla XML and...... [more]

Trackback from snellspace

at

RDF/XML is not the only fruit

Sam Ruby has come up with an XSLT stylesheet, which transforms Atom XML into RDF/XML. Sam said it was hard,...... [more]

Trackback from Raw Blog

at

To RDF with XSLT...

Sam Ruby has come up with an XSLT stylesheet which transforms Atom XML into RDF/XML. Sam said it was hard, which led to Sjoerd Visscher's creating a simpler version which takes advantage of his generic XML -> RDF mapping system......

Excerpt from Formerly Echo at

RE: Quantifying the "RDF tax"

Shelley,
Do whatever makes you happy. You don't owe me anything. Even if ATOM decides to become fully RDF compliant it probably won't be any more trouble to consume in an aggregator than RSS 1.0 currently is. It just means I'll have to avoid extending the format as much as possible so as not to trip over the RDF bits.

Message from Dare Obasanjo at

Don, I considered extracting triples, but that is really less readable, because often triples share the same subject, and the object of one triple often also becomes the subject of other triples.

I do use a very strict subset of RDF, so that subject, property and object are always elements (when the object isn't a literal). So each part of each triple is an element (or a textnode), which makes the triples almost directly visible.

Posted by Sjoerd Visscher at

Dare, whatever rules we come up with will have validator support.  This will be true whether these rules are required by underlying choices of technology or perhaps only inspired by another technology.

Even if we whole-hog adopt RDF in all of its glory, people won't have to read the RDF specifications any more than they have to read the full suite of XML specifications.  Instead, they merely need to emulate working examples, and then try to validate the results.

What this means is that if somebody were to attempt to define an extension in the format of exhibit A and then tried to validate an instance using this extension, the validator would identify that the use of an attribute without a qualifying namespace as an issue, and would suggest a corrective course of action.

The overall style and format of the message would look something like this.  And, like that page, would contain a link to relevant introductory materials and/or authoritative reference materials.

Posted by Sam Ruby at

Sam, what about on the other side -- the aggregator. What would the RDF tax be there. The concern is that you'd never be done writing an PEAW aggregator, that there would always be new twists people could throw at you, kind of like the problem with RSS and people using elements in namespaces to do roughly the same thing as core elements. Shouldn't the goal be to make the barrier to entry as low as possible for aggregator developers? Or is that one of your goals?

Posted by Dave Winer at

BTW, I ask because I saw a feed the other day (not naming names) that appeared to be coded so that it wouldn't work in a specific aggregator. I know from past experience in other protocols that people actually do things like this. It's bad for the Internet of course, counter to its philosophy. That's why simpler is better. It's harder to hide that kind of stuff. It's one of the reasons I'm such a broken record about "simple."

Posted by Dave Winer at

Dave, this feed can be easily parsed with regular expressions.  It can be easily parsed with an XML parser.  It can be easily parsed with an RDF parser. 

It is all about simplicity.  And leaving doors open rather than closing them.

As to your other issue: one of the goals of the Atom project is to cleanly and thoroughly specify what is valid in a feed.  When problems arise, one should be able to unambiguously determine whether a feed is broken, or an aggregator is broken, or if the spec itself is broken.

Posted by Sam Ruby at

Sam, sure one specific feed can easily be parsed with regular expressions. But aggregator developers have to write code that will read all feeds.

If all the doors are left open, you end up with nothing. If you don't make choices, you can never finish writing an aggregator. There's always some twist someone can throw at you. And then you'll need one aggregator to read one brand of weblog, and another to read the BBC, etc etc.

I've been wanting RSS to have the exact opposite philosophy. I want an aggregator developer to be able to focus on things that have a clear benefit for users, not running an endless race with people throwing new stuff at them. Ideally they'd start writing their aggregator in the morning and finish it by the end of the day. I don't mean finish it modulo the feeds that don't work, I mean finish.

I described this over on Shelley's with an analogy. I use this stuff to build the Golden Gate Bridge. But my bridge can't fly to the moon. I think that's okay. The bridge was built a long time ago, and someone just drove a car across it that uses GPS. The bridge held up just fine, so did the car. That's the kind of flexibility I want.

In my world, GPS is the work that Chris Pirillo is doing. Or the folks at Amazon or Rolling Sone. People who find new applications for syndication, not people who find new ways to make the same thing more complicated. I don't mean this in a negative way. As I watch you deal with all this stuff, I'm impressed with how flexible your mind is. That's cool. I just think we are interested in vastly different things.

I think it's good to get clear on what the differences are between RSS and what you're doing here. The more clear distinctions there are, the less confusion there will be. I know that sounds kind of obvious, but sometimes the obvious needs to be clearly stated.

Yogi Berra said you can observe a lot just by watching. Same kind of thing.

BTW, I did some editing here, but not a lot. I ask that people not read this as spec text, I am not a lawyer, your mileage may vary, all other disclaimers apply, etc.

Posted by Dave Winer at

Sam Ruby has an interesting thread on quantifying the RDF Tax. I posted a comment, a rare thing for me these days. ...

Excerpt from Scripting News at

Tax? Or Precision

I read the comments about "RDF tax" and how we must "prove" RDF's worth, yet when I look at so many plain XML feeds, all I can see is the improvement that could be added because of the precision of using RDF/XML. Not all XML feeds, but any that are... [more]

Trackback from Practical RDF

at

Meta-models all the way up: and why RDF is useful anyway

I've been away for a week and it looks there's a lot going in the RSS/RDF world; in grand blog tradition, time to comment on the comments :) Spike Solution... [more]

Trackback from Bill de hÓra

at

You can only get clean separation between bridge, car and GPS by clearly defining how each interacts with its immediate environment.

You maybe don't want any objects over 100 tons going across the bridge. One approach may be to list every item that might conceivably cross the bridge, and note its weight. Could take a while. It's also a little hard to predict every future item's weight.

Same with data for syndication. If you have to define the boundaries individually for every object or syndication module (as in RSS 2.0) then not only are you making a lot of unnecessary work, you're also leaving the door to problems open. You can't be certain that the next application's not going to break the aggregator.

A framework like RDF can help because it can define the boundaries systematically.

You can have your aggregator by lunchtime, but that's not all, you won't have to completely rewrite it when a new module comes along tomorrow.

Posted by Danny at

If RDF is a tax, it's alleged to be one that interconnects are roads. As I've written before, "Design is Like a Mortgage":

http://goatee.net/2002/09.html#_18we

"""... Think of purchasing an old fixer-upper home: you can select from a couple of properties on the market. First, you want something with the a sound footing and an inexpensive price. Also, you'll probably need a mortgage. The smaller the down payment, the larger the total cost. So ideally, you want your down payment to be as large a portion of the total price as possible. But, your initial cash reserve is limited, so you commit to your down payment and then you can at least move in and start fixing the house and increasing its value. Same thing with applications! In the end you want to move in and improve where most needed, but you also want something with a sound architectural footing. That's a balancing act, though sometimes there's design principles and technologies that lessen (win/win) immediate and future costs. RDF has a great architectural footing -- those who don't like it are doomed to reinvent it poorly -- but an immediate/localized cost of comprehension. For example, in RSS 1.0 the order semantic of RDF sequences imposes a cost without much benefit. It's a sequence, but you don't know what sort of sequence: a mandatory RDF artifact for an optional feature doesn't make much sense to me..."""

Posted by Joseph Reagle at

RDF, about recent documents found

A number of comments reflect on the activities described in Sam Ruby's Quantifying the "RDF tax" and XML to RDF transformation.... [more]

Trackback from the iCite net development blog

at

I've made some arguments on the wiki against using RDF for Atom: NoToRDF.  Of course, they may be updated, but the two arguments I've made are that Atom doesn't need RDF, and that application semantics don't come for free with meta-language parsers. Enjoy.

Posted by Sean B. Palmer at

Sjoerd, no problem.  If RDFers wants triples, they can grunt for themselves.

Sam, there are doors you don't want to open.  Look at the example feed and tell me with a straight face that those rdf:parseType and rdf:RDF cost nothing in clarity.

Posted by Don Park at

Don: I will admit that I don't see the overwhelming harm in the rdf:RDF marker - after all, it is only one element.

On the other hand, I will freely admit that I see interleaving rdf:parseType throughout the document is an unreasonable burden.  However, take a close look at minAtom.rdf.  If you are doing so from IE, make sure that you view source.  No rdf:parseTypes to be found.  Why are they there before you view source?  Because they have been magically inserted there in the proper places by the DTD.

Posted by Sam Ruby at

RDF Syntax is Readable

Here's an interesting discussion of whether RDF/XML is readable, compared to a custom-made XML syntax for Atom. My first reaction to this discussion was that when I looked through the example RDF/XML on that page, it seemed pretty straightforward,... [more]

Trackback from About Kim

at

A Useless Comment on Atom 0.2, RDF Style

Some people have suggested that Atom could be made more RDF-friendly; others object. Most helpfully, we now have a suggested RDF version of the Atom 0.2 example. After looking at it, all I can say is how ugly. It's not that I don't...

Excerpt from Y. B. Normal at

Sam, I thought about mentioning before about the possibility of using default attributes, but I concluded that:

a) RDF applications would have to use validating parsers,

b) it is ultimately a hack, and

c) it would weaken my position among people who don't share my bias against RDF.

Posted by Don Park at

If you didn't get my joke, never mind.

Posted by Don Park at

XML to RDF transformation for Atom with XR

Sam Ruby made an XSLT tranformation from Atom to RDF. He said it was hard to do. On the #echo IRC channel I said that "XR" would probably make it a lot easier. Then Sam asked me to make an XR transformation that would do the same thing. I did, and...

Excerpt from Sjoerd Visscher's weblog at

Sam Ruby: Quantifying the "RDF tax"

sdf asdfl kajsdf;lajs f;lasdjf ;alsdfj ;alsdfj a;sldfj a;sdflja sdl;fj as;dlfj as;dlfja ;sldfja ;sldfjas;ldf jas;dlfj a;sdfjlaksdf a;sd a;lsdkf ;alsdfkj asldfj al;sdfj ;asdflk; ajsdf...

Excerpt from del.icio.us/tag/atom at

What’s wrong with RDF?

Yesterday I stumbled again on Ruby’s post: Quantifying the “RDF Tax” . Sam is a very practical guy and I will give him that the Atom XML format is short and sweet, but I think Sam, Dare and others are optimizing too much how much tax they want to...

Excerpt from Planet RDF at

Quantifying the “RDF tax”

This core problem (i.e. lines of code) starting this discussion seems to be more about an XML tax than anything else. I am sure I can apply my Lisp knowledge to come up with a more concise (and readable IMHO) notation for the RDF that uses fewer LOC.

Posted by Patrick Logan at

Quantifying the “RDF tax”

I am sure I can apply my Lisp knowledge

[link]

Posted by Mark at

What a Tangled Web we Weave

I’ve said it before, and I’ll say it again, use RDF/XML at your own risk. Ian Davis gives us 16 different RDF/XML serializations of three triples. danbri adds more to the pot. I think Danny Ayers hits the nail on the head when he says "HTML is a...

Excerpt from Mike Graves's Work Blog at

RDF: needs more showing less telling

I meant to post this blog entry quite a while ago, but never did. Then Stephen went off on the old show versus tell riff, and I thought... lets not waste the words.A representative of the Resource Description Framework (RDF) crowd asked: What’s...

Excerpt from James Governor's MonkChips at

RDF: needs more showing less telling

I meant to post this blog entry quite a while ago, but never did. Then Stephen went off on the old show versus tell riff, and I thought... lets not waste the words.A representative of the Resource Description Framework (RDF) crowd asked: What’s...

Excerpt from James Governor's MonkChips at

Sam Ruby's Earlier attempt at an XSLT transform from Atom 0.2

chimezie: In his words it’s a transform into “the most comprehensive RDF equivalent”; chimezie: Makes use of FOAF, DCTerms, DC, and MetaVocab vocabulary; DanC: hope to look into this; the approach sounds interesting; chimezie: See: Sam’s original...

Excerpt from Semantic Web Interest Group Scratchpad at

Semantic Web book recommendations?

A friend asked me for a recommendation on a good book on Semantic Web. Do you recommend one? I really can’t think of one. I learned everything I know about the Semantic Web on irc.freenode.net/#swig (a.k.a the sw ‘hood). If I were to...

Excerpt from Elias Torres at

Quantifying the “RDF tax”

Amazed to be here and want to read this amazing RDF Tax. Thanks for this interesting tax article and i would really want to say thanks for this info.

Posted by Online Deals UK at

Quantifying the “RDF tax”

If you want to score well and achieve your targeted marks, you can get help from us. We have highly qualified experts who can help you to solve your assignment problem. They have been worked in this industry for more than 10 years. Also, they charged very less from students.

Posted by best assignment experts at

Add your comment