It seems that both
seem to think that they each have exclusive rights to the domain of
small positive integers. I wonder what part of
Globally Unique Identifier people have difficulty
understanding? Seems clear enough to me.
While neither RSS
1.0 nor Atom can
absolutely prevent collisions, both
atom:id are defined as URIs. This means that such
identifiers (by virtue of not containing a colon) will be evaluated
as relative to the source page. Which means that they only
need to be locally unique.
Furthermore, both rdf:about and atom:id are defined as being
required. Which is a
If you're an MT user, I wish you the best of luck with atom:id
Thanks for adding this to the validator. On a related note I'd like to thank you and Mark for writing the feed validator. I've lost count of the amount of times I've gotten mail or bugs filed about how some feed doesn't work in RSS Bandit which was quickly resolved by sending the person to the Feed Validator.
Is there a particular reason for this decision or did you guys just overlook this issue?
Dare: binary doesn't, by itself, mean executable. So, from that perspective, whether the mode is xml, escaped, or base64 should make no difference. What does, however, make a difference is whether the data, once unescaped or decoded, actually contains a script or not.
And, in case you are wondering, the validator does contain explicit code to detect scripts encoded in base64 in Atom feeds. Here is a testcase and here are the results.
If you have other conditions you would like to see checked for, please let us know, preferably by opening a bug report or feature request on sourceforge.
Am I missing something? What on earth is the point of giving GUIDs to different posts? Isn't that what URLs are for? Can't any client generate a GUID just by hashing the post content? And, personally, I don't buy the scenario of moving your blogs across different domains. I'd say if the URL of your blog changes you have a new blog and more than likely you want people to treat your blog as completely new and separate entity. These kinds of fuzzy 'these-two-are-the-same-but-really-they're-not' situations always cause huge headaches in distributed systems for very little gain.
Bo, as far as the reason, reference the Globally Unique Identifier link in Sam's post, esp. "It's up to the source of the feed to establish the uniqueness of the string.". Sounds like the validator is following the spec. How is hashing a post "globally unique". For all you know, that's how Blogdex and KeepMedia generate guids. And if the URL of a blog changes, but the content moved with it, shouldn't the GUIDs stay the same? To me, "moving a blog" is the same as relocating the content. For examle, the URL of Sam's blog has changed from when it was hosted at Userland, but the content moved with it. I haven't checked, but I assume that the guids stayed the same with the move.
Bo, I think that there are a number of aspects to this that need to be teased apart.
The best place to look for the primary reason why GUIDs were introduced into RSS 2.0 is in the comments section of that spec. The primary use case seems to be to allow "aggregators to not repeat items, even if there have been editing changes.". Editing changes. Changes which would affect the value of a hash.
A second use case for guids would be to enable the identification of blog entries that have been syndicated. Everything I say shows up in Planet Apache. It looks like weblogs 2 asp.net is a similar service. There are many others. In such circumstances, there is the potential for uniqueness to span feeds, and this presumably is the motivation for guids to be defines as Globally unique ids.
A third use case deals with portability of weblogs. Quite frankly, that use case remains speculative and controversial. It will be interesting to see how that discussion turns out.
Bottom line, while it may be tolerable for the same blog entry to occasionally be issued a new guid / rdf:about / atom:id value, it is never tolerable for two different blog entries to have the same guid / rdf:about / atom:id. That is what the feedvalidator is trying to guard against.
Most binary content is dangerous, regardless of whether it is executable or not. Accepting arbitrary MP3 files or Word documents can lead to as many security issues as accepting arbitrary HTML fragments with embedded script depending on what tools you are using.
I'll probably file a bug report on SF later today. Thanks for the response.
I generally agree that entries should have globally unique identifiers, and on the web they are URIs. There is an alternative, for systems to determine an entry's identity through a combination of its characteristics (as done by FOAF) but without a framework in which to use this (e.g. you-know-what) I think it would be too much work.
Still, the MT problem Phil points to is tricky.
Just as a strawman, might there be a way of using, rather than URIs, URIRefs? i.e. use frag IDs to look after post versioning, so the original version of a post might be:
re. dangerous code - I'm afraid Dare's probably right, once you start deciding things are potentially dangerous there's a vast array of material to take into account. Personally I'd leave it to the client to figure out (i.e. if it automatically runs executables or scripts, then it will rapidly become unpopular!)
Sam, do you know if anyone has done a feed of exploits to test aggregators against, and also to test whether we are right that they are exploitable? I threatened to do it back at the time of the Great Platypus Attack, but I still haven't gotten around to it, and I'm beginning to suspect that in a number of cases our assumptions about what is and isn't dangerous could use some refining by actual test.
Hey Sam, thanks for pointing this out. I'd just like to point out that I have filed a patent for all integers up to 1 billion, but until that application comes back I've switched the GUIDs to a fully qualified URL. I had originally used just the internal Blogdex ID for brevity's sake and thought that I changed it a while back. It should be working now.