Yaron Goland: The Emperor standards quietly completely hidden in his black robes while Darth Sudsy, covered in his black carapace, leads in Luke Restafarian in a battle torn uniform, long dreadlocks dragging and heavy fatigue evident in his face and stance. Guards stand by the door at stiff attention.
The hero in this dystopian tale is the spec (especially §3.2.1, §3.3, §4.1.1.1); and it’s trusty side kick, the Feed Validator. Note the messages the latter produces on this feed, and think about how much more useful the feed would be to RSS Bandit if these warnings were heeded.
Yaron Goland: The Emperor standards quietly completely hidden in his black robes while Darth Sudsy, covered in his black carapace, leads in Luke Restafarian in a battle torn uniform, long dreadlocks dragging and heavy fatigue evident in his face and stance. Guards stand by the door at stiff attention.
The hero in this dystopian tale is the spec (especially §3.2.1, §3.3, §4.1.1.1); and it’s trusty side kick, the Feed Validator. Note the messages the latter produces on this feed, and think about how much more useful the feed would be to RSS Bandit if these warnings were heeded.
It is true that if you start with a custom vocabulary that nobody understands, and add voids to it, what you end up with is darkness that benefits no one.
But if you add meaningful ids, titles, summaries, and dates; what you have can be minimally consumed by existing consumers AND by imperial forces alike. If you add a little more, you can even get bi-directional, asynchronous synchronization, if you are into that kind of thing.
I will sadly report, however, that the battle to recommend that textual summaries be present when the content isn’t a text construct left a small scar in the spec that can be exploited. In the face of arguments made by some to allow the omission of all content and summary elements, a compromise of sorts was reached that fell short of recommending textual summaries be present in entries with XML content. I mention this as an area that dark forces can exploit and an area where vigilance amongst Jedi everywhere is required.
I don’t know how often you get this but THANKS A LOT FOR THE FEED VALIDATOR!!!
You guys are doing a great service to the Web.
THANKS A LOT FOR THE FEED VALIDATOR!!!
FYI: the Feed Validator is scheduled for a significant upgrade on Monday. Here’s a sneak preview.
Hmmm.. funny, I can’t seem to find a single application that’s capable of using that “pure interoperable data” in Yaron’s post. I am, however, able to find lots of applications that can parse the Atom version. I also find it humorous that the post used such a contrived and silly conversion into the Atom syntax. Why not use something like this instead:
<entry xmlns="http://www.w3.org/2005/Atom"> <id>tag:example.org,2005/someuser/profile</id> <title>Some User's Profile</title> <updated>2000-01-01T00:00:00Z</updated> <author><name>Profile System</name><author> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <div class="profile"> <div class="section" id="professional"> ... </div> <div class="section" id="personal"> ... </div> <div class="section" id="clothingPreferences"> ... </div> </div> </div> </content> </entry>
Oooo... feel the force just ooze out of that
Sam, the reason the items were left out in the example is that the data needed to fill them in generally wouldn’t exist. I think I need to do a better job of explaining the scenarios I’m worried about which are machine to machine communication. E.g. no human anywhere in the immediate loop.
James, Actually anyone with an XML parser and a HTTP stack that can execute GET can understand the data structure I posted. Also turning this data structure into XHTML doesn’t seem productive because the data is explicitly not intended for human readability. It is intended for machine to machine communication of rich structured data to which XHTML adds little. My basic point, that I will expand upon in my next post, is that ATOM forces data into a record oriented format that requires the data to be restructured. But if one is moving around structured data then it doesn’t make any sense to me to force these contortions to fit the data into ATOM’s required structure.
First off, let’s be absolutely clear about the fact that if the functionality Atom provides is not required by an application, there’s absolutely no reason to restructure anything... just don’t use Atom. Second, there’s also no reason I can see that the following wouldn’t also work:
<entry xmlns="http://www.w3.org/2005/Atom"> <id>tag:example.org,2005/someuser/profile</id> <title>Some User's Profile</title> <summary>Some User's Profile</summary> <updated>2000-01-01T00:00:00Z</updated> <author><name>Profile System</name><author> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <profile xmlns="http://example.com"> <professional> <workTitle>…</workTitle> ... </professional> <personal> <spouseName> ... </spouseName> ... </personal> <clothingPreferences> <favoriteColors> <shirts>...</shirts> ... </favoriteColors> ... </clothingPreferences> ... </profile> </div> </content> </entry>
Sure, there’s no reason for the id, title, summary and author elements but if you need the functionality that Atom provides, they’re really not that difficult to boilerplate.
Also, FWIW, this is precisely why Atom and Atompub make it trivial to work with data in it’s native form. There’s no reason at all why the data would need to be restructured or reformatted into order to work with Atom.
<content type="application/xml">
“the reason the items were left out in the example is that the data needed to fill them in generally wouldn’t exist”, a.k.a. “these are not the droids you are looking for”. I will, however, note that you tend to blog on subjects related to Astoria, and as such you are making assertions about data not being present in schemas that you aren’t the one defining.
What I imagine you are really saying is that you are building a generalized service and would like to place as few constraints on your users as possible. If so, I will merely point out that I’ve dealt with formats and protocols which are “explicitly not intended for human readability” and I’ve learned to avoid them.
James Snell wrote
First off, let’s be absolutely clear about the fact that if the functionality Atom provides is not required by an application, there’s absolutely no reason to restructure anything... just don’t use Atom.
I completely agree with this. My canonical question is "What are the benefits and costs of Facebook switching friends.get from it’s current form to an APP centric approach?". Sure, I can do what GData does or any one of the sterling examples James has posted but I question the “benefits” of this relative to the increased complexity that is being pushed on the several thousand developers who are using the Facebook platform.
Am I just completely missing the point here? I don’t see how having a spec and validators addresses the problem that Yaron is describing. In a nutshell, it looks to me like the problem is that “Atom” tells you nothing about the structure of the data, only about its presentation. This is like the situation you’re in if you have “XML” but no schema. You know the syntax, but you can’t do any semantic manipulations.
Saying “This is a (valid) Atom feed” doesn’t address the interoperability problem, and Yaron seems to be suggesting that it actually obscures it. Metaphorically, just because something is a valid feed doesn’t mean you know how to digest it (that is, extract useful stuff from it). And the result he’s warning about--non-interoperability masquerading under the banner of interoperability--seems entirely plausible.
friends.get
is not a full APP (GET, POST, PUT, DELETE), it is just GET, so this question reduced down to... should friends.get
return data in a feed format like Atom?
Hmmm, this reminds me of a question that I never got around to answering...
I will say that whenever I design data sources these days, I now try to capture all the essential information whenever possible and as early as possible.
Yaron: “I need to do a better job of explaining the scenarios I’m worried about which are machine to machine communication.”
You really do. Start by defining the machines that will communicate with each other. Then you can start on their interlingua.
Dare: “My canonical question is "What are the benefits and costs of Facebook switching friends.get from it’s current form to an APP centric approach?”. Sure, I can do what GData does or any one of the sterling examples James has posted but I question the “benefits” of this relative to the increased complexity that is being pushed on the several thousand developers who are using the Facebook platform."
The interlingua you actually want for m2m - Yaron’s stated concern - exist, but do not get deployed en masse. Such as RDF, which you used to be fond of criticizing. Primitive interlingua, things like internet protocols, are deployed heavily. The more widely these are deployed, the easier it is for machines to communicate; in general data moves more freely which has its own value. Under that worldview, even simple protocols like Atom Protocol and HTTP, or formats like XHTML and JSON, offer big wins compared to local approaches.
One positive side effect of protocols/interlingua is that they make what it not interoperable clearer. IOW, Atompub/Atom shines light on the fact we have serious issues around sharing and “understanding” structured content. Complaining about the source of light isn’t sensible.
So, to me, this is like the web services debates rehashed half a decade later. The only interesting difference in this thread is the level the argument is happening - around the content/payload/mediatype instead of wire/transfer. If an SNS or big web property defined a custom hypertext transfer protocol that would be evidently silly; we already understand the value of HTTP. Maybe it’s not as obviously random at this time to argue that per silo data access formats are a good idea.
On that basis, feel free to explain the claim of “increased complexity”; I see less code to write, less operational overhead and frankly better understanding of where I need to spend time understanding and munging the data. A cogent argument would be to say that we’re not ready to standardise these concerns - which as it happens the Atompub WG did for querying, batching, and synchronization. I could even be prepared to listen to an argument that said a big silo like Facebook is its own world and has no reason to use a generic format. But what’s been presented so far isn’t holding water.
[Fwiw, “technically”, I don’t see why facebook don’t serve class laden XHTML and document the attributes.]
James/Bill - I’m running into numerous people who think that if you just sprinkle some magic ATOM pixie dust on something then it suddenly becomes a standard and is interoperable. That is clearly nonsense. But the problem is that there does not exist any crisp definition I am aware of that delineates the kind of problems that ATOM should be used for and the kind of problems that ATOM shouldn’t be used for.
For example, if I need a simple application transport protocol that handles request/response really well and can move around variable payloads (not to mention punch holes in any firewall) then I know to use HTTP.
If I need structured hierarchical data that can be read far and wide then I need to serialize using XML (although JSON is looking better every day).
What are the criteria that identify a problem that ATOM is a good solution for? And even more importantly what are some key flags to look for that identify a problem that ATOM is probably a bad idea for?
With clarity around these kinds of guidelines it becomes much easier to help steer people in the right direction in terms of using ATOM. What I’m trying to avoid is the early period of HTTP’s popularity when the answer to any and all protocol problems was “Use HTTP POST”. We spent a good decade digging out of that mess.
So my hope is that we can figure out some guidelines about good and bad uses of ATOM.
Sam - By ‘not human readable’ I only meant that there is no RSS Bandit or other APP reader anywhere in this scenario, it is purely machine to machine. However the data itself needs to be both easily human readable and write-able in order to allow for simple programming and debugging. See here.
What are the criteria that identify a problem that ATOM is a good solution for? And even more importantly what are some key flags to look for that identify a problem that ATOM is probably a bad idea for?
My take: Atom is good for circumstances where data can be organized into “chunks” that you can identify by title, location, who made the last change, when that change was made, and the data itself is either textual or a brief textual summary can be obtained/synthesized.
When is it bad? While the above list suggests several pieces of data that should be present, these requirements don’t tend to be equally weighted. The most fundamental pieces of information in my experience are ones that allow a client to answer the following two questions: have I seen this information before, and did it change? Where can I find it and did it meaningfully change are next.
I see this as a progression. “Use HTTP POST” was a symptom of of a greater problem, namely the problem of not using URIs to identify all important resources. Beyond that, a litmus test I have often used to see if people “get” HTTP is whether or not they can support “304 Not Modified” status — I tend to use ETag as the “handle” for these discussions.
In the early days of syndication (think: RSS 0.91), items were titles and links and an optional description, and it was entirely up to the client to infer whether it had seen a given item before and whether or not it had changed. We’ve learned a lot since then.
Magic pixie dust? Not quite. But my experience is that being able to answer “did this resource change?” in both machine and human terms is essential for proper use of HTTP and Atom respectively.
I think we can agree that ETags all on their lonesome effectively answer the question “Has it changed?” So this just leaves the question - have I seen it before? I would argue that in the 99% case if you have seen a particular URL then you have seen the resource behind it. I realize this is not required by the HTTP object model but in practice it is typically true. And for the other 1% case there is resource-id.
So I think your first line sounds like the right starting point in identifying something that is uniquely ATOM (as opposed to just good HTTP practice): “Atom is good for circumstances where data can be organized into “chunks” that you can identify by title, location, who made the last change, when that change was made, and the data itself is either textual or a brief textual summary can be obtained/synthesized.”
I hope you don’t mind me stealing that line (with attribution of course :) for my upcoming blog post which is morphing into guidelines for when to use ATOM and when not to use ATOM. My own take is that ATOM is at its best in two particular cases - when discussing items out of their native context (e.g. blogs are organizations of web pages, or using collections to make comments on pictures or returning search results) and it’s especially powerful when you need to be able to discuss potentially non-end user readable data to end users (e.g. here are the configuration changes made to your account with pointers to those changes and ATOM elements to explain what the heck is going on). But my own thinking is still evolving.
if you have seen a particular URL then you have seen the resource behind it.
A single feed often contains more than one entry.
P.S. It is Atom.
“James/Bill - I’m running into numerous people who think that if you just sprinkle some magic ATOM pixie dust on something then it suddenly becomes a standard and is interoperable.”
Now you know how I’ve felt dealing with RPC/SOA/WS/ESB proponents for at least last half decade, aside from being considered a troublemaker/idiot for saying the Web/REST is the architecture to shoot for - there’s a reason some of us are sensitive to what Paul Downey calls its “brand dilution”. Frankly some of the REST hype makes me cringe.
As for when Atom and Atompub is not appropriate; I’m honestly not sure. I’m surprised at how useful Atom in particular is outside blogging scenarios as a general purpose format for any kind of “event”.
When I read Dare’s original post on Atompub, I provided two actual issues:
- Update resumption
- Batch and multi-part uploads
that I felt were tricky. But in the following debate, I suspect the irony of the post’s title was lost on a lot of people. It’s quite something to see sudden, rapid, adoption from people who were utterly convinced that WS-* and its ilk was going to fix both EAI and the Web.
It’s also worth pointing out things that were excluded from Atompub:
- batch upload
- querying
- synchronisation
For Atom itself, I’d suggest it’s not good for physically representing hierarchies. This seems to be a problem with XML based formats in general; they all appear to have problems with that.
But what I read from your post is that the root issue is a general interop problem with structured content; delivery using Atom/Atompub highlights this, but is not a cause of it. It’s always been there.