There have been assertions of an "RDF Tax". Not having an
opinion on the subject, I decided to do a little
investigating. In particular, I sought to identify the
highest potential "ceiling" to the RDF tax.
The initial results are that the maximal Atom 0.2 snapshot
balloons from 47 non-blank, non-comment lines to a whopping 61
lines. And that is before simplification.
Clearly, some readibility was lost in the process. This, too,
needs to be addressed.
A number of observations:
leaving the XSLT as an "exercise for the student" ends up
sweeping a lot under the rug. This was hard, and I'm
not sure I'm close to being done.
this effort provided an alternate insight into this data, which
surfaced a number of questions I never pondered before. For
example: is the order of contributors significant? This needs
to be answered and documented.
a number of possible synergies also arose. For example,
independent of RDF, might it make sense to re-use FOAF
terminology?
Once we have a sufficient understanding of the "proper" way to
apply RDF, then we can move on to exploring the
"
practical" way to apply RDF.
Alternatively to using foaf terminology, which would then apply a 'foaf tax', it might be better to supply a rdf snippet asserting
equivalence to a canonical atom rdf..
Sam,
I'm still waiting for anyone to give a good reason for Atom being RDF compatible besides buzzword compliance. Since you seem interested in making this happen can you point out the concrete benefits of doing this that don't contain the phrase "Semantic Web"?
PS: What was hard about writing the transform. The XSLT looks fairly straightforward so I assume the difficulty was in understanding the various RDF vocabularies such as FOAF, Dublin Core, etc.
Marvellous work, though "Quantifying the XSLT Tax" might have been a more accurate title ;-)
I don't think the 25% increase is anything to worry overmuchabout - I'm sure it can be brought down significantly (I notice an rdf:Description in there for a start). Some human readability might have been lost in the process, but a lot of machine-readability has effectively been gained.
Also I don't think your experiences here have very much to do with the "RDF Tax". If you had been using RDF from the start, I'm sure things would have been considerably less taxing - XSLT is hard! It's also worth remembering that a transform like this only has to be written once.
But RDF can certainly be hard work, and there can be proportionately a fair bit of extra work needed for the simpler apps. The benefit is that once the initial bits have been done, the data becomes immediately available for loads of other tools - the tax rebate.
FOAF terminology would be great, as long as the definitions were the same - what is the foaf tax, if the definitions are the same?
Good approach Sam. Will be interesting to follow this from more than just a Pie/Echo/Atom perspective.
I'm sorry I had to back out early, but I had to finish the taxes (wince -- owe -- wince -- big time -- wince). And I'm not much for IRC, the pace is too fast. Still, interesting experience.
Dare -- if we filter based on "Semantic Web", then we might as filter on single use XML vocabularies. Each of us come into this with different interests. It's important to respect that where you're interested in single use XML vocabulary, I'm interested in, SW. They key is to find middle ground, not exclude or filter based on interest.
I think that's what Sam is doing with this effort -- see how difficult it is to transform, seeing how complicated or simple RDF has to be, what's the value of using FOAF over DC, etc. Most importantly, he caught that there were a lot of assumptions about data and behavior based on the feed originally being pure XML. We need to separate that behavior and look at it individually outside the syntax. Not depend on what I still call accidental grouping because of syntactic placement.
A lot of words, and I used semantic web. Boy, I bet you hate my new weblog, Semantic Web of Poets grin
BTW, Sam, I love your proper vs practical thing bigger grin
Since RDF seems to be all the latest craze: Why? What kind of applications will people be able to produce that they wouldn't be able to produce with the lovely simplicity of the Atom 0.2 snapshot?
Human readability is underestimated. If this atrocius RDF syntax is chosen, you are going to scare off relatively non-technical people from producing an Atom feed. You can argue from here 'till eternity that "It's just applying a transformation with XSLT", but remember that many (most) people even have problems understanding a concept as simple as CSS.
Unless the Atom syndication format is going to lie in the same forgotten pool of mud that W3C RSS is, I believe RDF is best forgotten.
Shelley,
If I am being forced to learn RDF (Yes, I am being forced to learn RDF because I am an aggregator writer and blog developer who will use Pie/Echo/Atom/Whatever) then I'd like a good reason for it. The pipe dream that is the Semantic Web does not count as a good reason.
If there isn't one then I'll join those that vote for adding a canonical transform of the core Atom syntax to RDF to the Atom spec as opposed to baking it into core.
I'm with Dare on this one - I'm not sure what the benefit is here.
And the syntax is arguably more "complex-looking", which like it or not, will hurt adoption. Look at RSS - RSS 2.0 looks "nice and easy", and RSS 1.0 looks pretty complicated at first glance. And when you see new feeds coming online, generated by custom systems, what are they using? Usually 2.0.
Just to clarify a point, in case it wasn't clear.... this was an experiment. Only an experiment.
It is my expectation that what we will end up will require only a small delta from Atom 0.2. Aaron showed that a small delta was all that was needed. This lead to wild assertions that there were probably nine hundred and ninety nine steps to go. Today, we've reduced that estimate by an order of magnitude, if not more. Hopefully over the next few days, we will do that again. If we can do that one more time, we've really achieved something.
Will we get to the point where Atom is transform free to pure RDF? I don't know. I certainly would not be willing to give up on the "view source simplicity" that Atom currently has in order to solve the last mile problem. But I do know that we can reduce the impedence mismatch. And that Atom will become better specified in the effort to do so. And hopefully more compatible with a number of vocabularies. Not just in the RDF sense, but in the XQuery sense too.
Dare, and other non-RDF supports: I try and 'prove' RDF is a viable option for more than just Pie/Echo/Atom on a relatively frequent basis -- without 'forcing' it on people. I'll continue in my weblogs and other writing, but I'm reluctant to debate this issue here because me thinks it may not make a different what I say. Is it a possibility I have a good read on this?
Sam, I noticed that Sean talked you into List for contributors. And my tax thing was purely accidental -- no pun intended.
Shelley,
I am fairly open minded on technologies. If you can prove how RDF provides benefit to consumers and producers of ATOM feeds then I'm game. If all it does is bring ATOM closer to being part of the mythical Semantic Web that people have been hyping for the past few years with no concrete benefits then I'd rather it didn't complicate ATOM since thanks to the beauty of XML and XSLT, ATOM as RDF is a transform away.
This lead to wild assertions that there were probably nine hundred and ninety nine steps to go.
I am being quoted out of context. My point above is related to this statement
I asked that whatever
technique you come up with for mixing namespace
vocabularies should be able to solve the problem of
automatically converting an XSLT processor to an EXSLT
processor since this is "simply" a case of mixing
namespace vocabularies. I specifically stated that I
don't believe this problem can be feasibly solved and
neither do I believe RDF is a stepping stone to
solving this problem.
where I state that using RDF is 999 steps away from being able to solve the arbitrary XML vocabulary mixing problem in the general case and has nothing to do with how hard or easy it is to make ATOM become RDF compliant.
Marvellous work, though "Quantifying the XSLT Tax" might have been a more accurate title ;-)
I don't think the 25% increase is anything to worry overmuchabout - I'm sure it can be brought down significantly (I notice an rdf:Description in there for a start). Some human readability might have been lost in the process, but a lot of machine-readability has effectively been gained.
Also I don't think your experiences here have very much to do with the "RDF Tax". If you had been using RDF from the start, I'm sure things would have been considerably less taxing - XSLT is hard! It's also worth remembering that a transform like this only has to be written once.
But RDF can certainly be hard work, and there can be proportionately a fair bit of extra work needed for the simpler apps. The benefit is that once the initial bits have been done, the data becomes immediately available for loads of other tools - the tax rebate.
FOAF terminology would be great, as long as the definitions were the same - what is the foaf tax, if the definitions are the same?
(Sorry Sam, remind me not to do a browser refresh late in the day, forgetting there's still junk in the edit box...)
I reckon what's being done here is right on. It should certainly be possible for Atom feeds to be used directly as XML, without using the full RDF/XML syntax (or RDF model). But it should also be possible to make it easy for people that do use RDF to use the feeds easily as well. The steering looks on course for achieving both of these with only minor compromises either way. Fingers crossed ;-)
Seems to me there's a lot of room for simplification. Most of the blowup seems to be from dc and foaf stuff, if you're counting lines. If you start counting attributes, that's different.
Here are some alternate statistics, FWIW. The stats from running the two files through Xerces' SAXCounter yield:
atom02maximal.rdf: 297 ms (48 elems, 18 attrs, 0 spaces, 992 chars) - note that chars here is character data, not the size of the file
Sjoerd Visscher: Sam Ruby made an XSLT tranformation from Atom to RDF. He said it was hard to do. On the #echo IRC channel I said that XR would probably make it a lot easier. Then Sam asked me to make an XR transformation that would do the same ...
Sam, nice of you to do this. BTW, you would have had much easier time if you used XSLT to 'extract' RDF triples instead of 'transforming' Atom into RDF format.
I also appreciate Sjoerd doing the same in XR, I feel that XR was the right idea implemented wrong in the same way as I outlined above.
Chopping a tree into a stack of firewood is much easier and end product more useful than trying to build a tree out of firewoods.
I think Norm's approach is definitely capable of dealing with more RDF variations, though I don't think those variations are something you necessarily want to support in Atom.
Dare, I kind of look at it this way -- I can either spend my time writing articles and tutorials about the technology and working on a couple of unfinished applications, which I think might be useful and perhaps even a little 'cool', all built using RDF. Or I can spend time with the Pie/Echo/Atom talking folks about why the group should consider using RDF. I think my time is better spent on the more productive option, don't you?
When Sam starts simplifying that proper model into something practical, I'll help. And answer specific questions. Or provide specific examples. But I'm not going to spend my time answering questions that begin with "Shelley, prove to me..."
The recent considerations about RDF leave me wondering if we could consider XMI as a viable alternative to the oftentimes bloated and overly complex RDF syntax. XMI documents, done properly, are only mildly more more complex than vanilla XML and......
[more]
Sam Ruby has come up with an XSLT stylesheet which transforms Atom XML into RDF/XML. Sam said it was hard, which led to Sjoerd Visscher's creating a simpler version which takes advantage of his generic XML -> RDF mapping system......
Shelley,
Do whatever makes you happy. You don't owe me anything. Even if ATOM decides to become fully RDF compliant it probably won't be any more trouble to consume in an aggregator than RSS 1.0 currently is. It just means I'll have to avoid extending the format as much as possible so as not to trip over the RDF bits.
Don, I considered extracting triples, but that is really less readable, because often triples share the same subject, and the object of one triple often also becomes the subject of other triples.
I do use a very strict subset of RDF, so that subject, property and object are always elements (when the object isn't a literal). So each part of each triple is an element (or a textnode), which makes the triples almost directly visible.
Dare, whatever rules we come up with will have validator support. This will be true whether these rules are required by underlying choices of technology or perhaps only inspired by another technology.
Even if we whole-hog adopt RDF in all of its glory, people won't have to read the RDF specifications any more than they have to read the full suite of XML specifications. Instead, they merely need to emulate working examples, and then try to validate the results.
What this means is that if somebody were to attempt to define an extension in the format of exhibit A and then tried to validate an instance using this extension, the validator would identify that the use of an attribute without a qualifying namespace as an issue, and would suggest a corrective course of action.
The overall style and format of the message would look something like this. And, like that page, would contain a link to relevant introductory materials and/or authoritative reference materials.
Sam, what about on the other side -- the aggregator. What would the RDF tax be there. The concern is that you'd never be done writing an PEAW aggregator, that there would always be new twists people could throw at you, kind of like the problem with RSS and people using elements in namespaces to do roughly the same thing as core elements. Shouldn't the goal be to make the barrier to entry as low as possible for aggregator developers? Or is that one of your goals?
BTW, I ask because I saw a feed the other day (not naming names) that appeared to be coded so that it wouldn't work in a specific aggregator. I know from past experience in other protocols that people actually do things like this. It's bad for the Internet of course, counter to its philosophy. That's why simpler is better. It's harder to hide that kind of stuff. It's one of the reasons I'm such a broken record about "simple."
As to your other issue: one of the goals of the Atom project is to cleanly and thoroughly specify what is valid in a feed. When problems arise, one should be able to unambiguously determine whether a feed is broken, or an aggregator is broken, or if the spec itself is broken.
Sam, sure one specific feed can easily be parsed with regular expressions. But aggregator developers have to write code that will read all feeds.
If all the doors are left open, you end up with nothing. If you don't make choices, you can never finish writing an aggregator. There's always some twist someone can throw at you. And then you'll need one aggregator to read one brand of weblog, and another to read the BBC, etc etc.
I've been wanting RSS to have the exact opposite philosophy. I want an aggregator developer to be able to focus on things that have a clear benefit for users, not running an endless race with people throwing new stuff at them. Ideally they'd start writing their aggregator in the morning and finish it by the end of the day. I don't mean finish it modulo the feeds that don't work, I mean finish.
I described this over on Shelley's with an analogy. I use this stuff to build the Golden Gate Bridge. But my bridge can't fly to the moon. I think that's okay. The bridge was built a long time ago, and someone just drove a car across it that uses GPS. The bridge held up just fine, so did the car. That's the kind of flexibility I want.
In my world, GPS is the work that Chris Pirillo is doing. Or the folks at Amazon or Rolling Sone. People who find new applications for syndication, not people who find new ways to make the same thing more complicated. I don't mean this in a negative way. As I watch you deal with all this stuff, I'm impressed with how flexible your mind is. That's cool. I just think we are interested in vastly different things.
I think it's good to get clear on what the differences are between RSS and what you're doing here. The more clear distinctions there are, the less confusion there will be. I know that sounds kind of obvious, but sometimes the obvious needs to be clearly stated.
Yogi Berra said you can observe a lot just by watching. Same kind of thing.
BTW, I did some editing here, but not a lot. I ask that people not read this as spec text, I am not a lawyer, your mileage may vary, all other disclaimers apply, etc.
I read the comments about "RDF tax" and how we must "prove" RDF's worth, yet when I look at so many plain XML feeds, all I can see is the improvement that could be added because of the precision of using RDF/XML. Not all XML feeds, but any that are...
[more]
Meta-models all the way up: and why RDF is useful anyway
I've been away for a week and it looks there's a lot going in the RSS/RDF world; in grand blog tradition, time to comment on the comments :) Spike Solution...
[more]
You can only get clean separation between bridge, car and GPS by clearly defining how each interacts with its immediate environment.
You maybe don't want any objects over 100 tons going across the bridge. One approach may be to list every item that might conceivably cross the bridge, and note its weight. Could take a while. It's also a little hard to predict every future item's weight.
Same with data for syndication. If you have to define the boundaries individually for every object or syndication module (as in RSS 2.0) then not only are you making a lot of unnecessary work, you're also leaving the door to problems open. You can't be certain that the next application's not going to break the aggregator.
A framework like RDF can help because it can define the boundaries systematically.
You can have your aggregator by lunchtime, but that's not all, you won't have to completely rewrite it when a new module comes along tomorrow.
"""... Think of purchasing an old fixer-upper home: you can select from a couple of properties on the market. First, you want something with the a sound footing and an inexpensive price. Also, you'll probably need a mortgage. The smaller the down payment, the larger the total cost. So ideally, you want your down payment to be as large a portion of the total price as possible. But, your initial cash reserve is limited, so you commit to your down payment and then you can at least move in and start fixing the house and increasing its value. Same thing with applications! In the end you want to move in and improve where most needed, but you also want something with a sound architectural footing. That's a balancing act, though sometimes there's design principles and technologies that lessen (win/win) immediate and future costs. RDF has a great architectural footing -- those who don't like it are doomed to reinvent it poorly -- but an immediate/localized cost of comprehension. For example, in RSS 1.0 the order semantic of RDF sequences imposes a cost without much benefit. It's a sequence, but you don't know what sort of sequence: a mandatory RDF artifact for an optional feature doesn't make much sense to me..."""
I've made some arguments on the wiki against using RDF for Atom: NoToRDF. Of course, they may be updated, but the two arguments I've made are that Atom doesn't need RDF, and that application semantics don't come for free with meta-language parsers. Enjoy.
Sjoerd, no problem. If RDFers wants triples, they can grunt for themselves.
Sam, there are doors you don't want to open. Look at the example feed and tell me with a straight face that those rdf:parseType and rdf:RDF cost nothing in clarity.
Don: I will admit that I don't see the overwhelming harm in the rdf:RDF marker - after all, it is only one element.
On the other hand, I will freely admit that I see interleaving rdf:parseType throughout the document is an unreasonable burden. However, take a close look at minAtom.rdf. If you are doing so from IE, make sure that you view source. No rdf:parseTypes to be found. Why are they there before you view source? Because they have been magically inserted there in the proper places by the DTD.
Here's an interesting discussion of whether RDF/XML is readable, compared to a custom-made XML syntax for Atom. My first reaction to this discussion was that when I looked through the example RDF/XML on that page, it seemed pretty straightforward,...
[more]
Some people have suggested that Atom could be made more RDF-friendly; others object. Most helpfully, we now have a suggested RDF version of the Atom 0.2 example. After looking at it, all I can say is how ugly. It's not that I don't...
Sam Ruby made an XSLT tranformation from Atom to RDF. He said it was hard to do. On the #echo IRC channel I said that "XR" would probably make it a lot easier. Then Sam asked me to make an XR transformation that would do the same thing. I did, and...
Yesterday I stumbled again on Ruby’s post: Quantifying the “RDF Tax” . Sam is a very practical guy and I will give him that the Atom XML format is short and sweet, but I think Sam, Dare and others are optimizing too much how much tax they want to...
This core problem (i.e. lines of code) starting this discussion seems to be more about an XML tax than anything else. I am sure I can apply my Lisp knowledge to come up with a more concise (and readable IMHO) notation for the RDF that uses fewer LOC.
I’ve said it before, and I’ll say it again, use RDF/XML at your own risk. Ian Davis gives us 16 different RDF/XML serializations of three triples. danbri adds more to the pot. I think Danny Ayers hits the nail on the head when he says "HTML is a...
I meant to post this blog entry quite a while ago, but never did. Then Stephen went off on the old show versus tell riff, and I thought... lets not waste the words.A representative of the Resource Description Framework (RDF) crowd asked: What’s...
I meant to post this blog entry quite a while ago, but never did. Then Stephen went off on the old show versus tell riff, and I thought... lets not waste the words.A representative of the Resource Description Framework (RDF) crowd asked: What’s...
Sam Ruby's Earlier attempt at an XSLT transform from Atom 0.2
chimezie: In his words it’s a transform into “the most comprehensive RDF equivalent”; chimezie: Makes use of FOAF, DCTerms, DC, and MetaVocab vocabulary; DanC: hope to look into this; the approach sounds interesting; chimezie: See: Sam’s original...
A friend asked me for a recommendation on a good book on Semantic Web. Do you recommend one? I really can’t think of one. I learned everything I know about the Semantic Web on irc.freenode.net/#swig (a.k.a the sw ‘hood). If I were to...