Jorgen Thelin: Adding in the regexp suggested by Sam for RFC-822 format dates to the RSS 2.0 schema, I have come to the conclusion that I must be missing the point somewhere... although I get the syntactic validation of the data, I have lost the semantic meaning of the schema type model. Am I missing something obvious here?
IMHO, schema only captures higher level syntax. Title and Description are both strings... what semantics can one infer from that? Meanwhile, you did define a simpleType that can be profitably reused.
On a related note: Mark Nottingham is looking for a new element that had the same semantic, but was a proper W3C datetime. This can be found in the Dublin Core module.
Are there any other types for which a regex is desired?
I agree with you that there isn't much "semantic meaning" in a schema in the first place (which is kinda the point of my rant at http://www.kuro5hin.org/story/2003/4/19/211533/168 ) but this doesn't change the fact that something is lost by having to use an xs:string instead of an xs:dateTime.
Specifically technologies that utilize the PSVI to provide typed access to validated XML including technologies like XQuery and object<->XML data binding technologies would not treat the type as a date but instead would treat it as a string which makes processing such dates a lot more tedious.
Dare, would it be fair to say that you would prefer something like dcterms:issued over pubDate and dcterms:lastmodified over lastBuildDate?
What I would most like to see happen here is that a bunch of people who produce or consume RSS feeds express their preferences on subjects like this, and that this be collected into a set of best practices, and that the validator be updated to provide guidance in areas such as these.
Do you have any other preferences?
Sam,
Not really. I'm not currently using a schema to convert RSS feeds into objects or to get typed access to specific values so it doesn't matter that some date format is RFC 822 instead of ISO 8601 since the code I use to process them is indifferent to such issues.
I personally prefer pubDate because that's what my code currently supports and that's what's in most of the feeds that I've seen. Thus less work for me. ;)
PS: Exactly what am I supposed to do with the lastBuildDate info?
James: you never heard of relative urls?
Now that the DNS changes have had a chance to catch back up, I've gone back to full urls.
Of course I've heard of relative urls. However, if your link (feed level) is set to
/blog
and items to things like
/blog/1368.html
and the feed url is
http://www.intertwingly.net/blog/index.rss
then by eyeballing it, the relative url is obvious. In code however, it could easily be guessed as either:
http://www.intertwingly.net/blog/
or http://www.intertwingly.net/
as the base url. See what I mean about guessing?
James, see the documentation for URIs and XML:
Sam,
My position is that I would prefer to keep pubDate because it means I dont have to change my code. :)
Just because I can see some edge cases where using RFC 822 dates would require a relatively trivial amount more work to process than ISO 8601 doesn't mean I think they should be replaced.
Dare, so it probably is a good thing that Jorgen's schema defines a simpleType with a regex, wouldn't you think?
P.S. My rss2 feed uses dc:date.
Sam,
I wouldn't go as far as calling it a good thing but "good enough" probably accurately describes how I feel about it.
PS: RSS Bandit supports both pubDate and dc:date so I already had you covered.
Sam,
By my reading of those docs, the way your urls were set up was incorrect. i.e., the only possible base url was the feed link, which ended in /blog/blah.html
now the relative url looked like
/blog/blah2.html
which, as I read the docs, would result in
/blog/blog/blah2.html
which isn't correct. Am I reading them wrong? I don't think so...
James, The RSS feed itself is at http://www.intertwingly.net/blog/index.rss. Evaluating /blog/1368.html relative to that URL would result in http://www.intertwingly.net/blog/1368.html.
I've made the relative URL cited above a hypertext link. Try clicking on it to see where your browser takes you. View source if you'd like to verify.
http://intertwingly.net/blog/ + /test.html = http://intertwingly.net/test.html
If I define this in HTML:
[BASE HREF="http://intertwingly.net/blog/"]
then I do this:
[A HREF="test.html"]
that points to http://intertwingly.net/blog/test.html
However, if I do this on the same page:
[A HREF="/test.html"]
that points to http://intertwingly.net/test.html
Try it yourself.