Sam Ruby

Feed explorations

2003-07-30T09:04:38-04:00

All LiveJournal test users now have Atom feeds. Here is an example. I happened to pick one that was in Russian, demonstrating the use of UTF-8. Also notable is that issued dates don't have time zones - something that LiveJournal doesn't track for this particular field. I'm particularly looking forward to the point in time when LiveJournal starts exploring the serialization of threaded comments.

Simon Fell's Atom feed demonstrates explicitly marking the namespace on every item. More interestingly, he makes use of xml:base to have relative urls in links, ids, and even content.

Gordon Weakliem has created feeds for Amazon.com, and is exploring issues with dates and times.

RE: Feed explorations

2003-07-30T09:20:52-04:00

Sam ,
Do you or anyone else have any stats about which XML parsers on various platforms support xml:base? For instance neither MSXML (Microsoft's COM-based XML parser) nor the parsers in the System.Xml namespace of the .NET Framework support xml:base.

Using xml:base puts a burdon on developers using Microsoft's XML parsers to write their own implementations of the xml:base recommendation.

Feed explorations

2003-07-30T09:31:23-04:00

Too bad about the missing time zones, since it makes the dates invalid.

Regarding the xml:base usage, I wonder what'll happen when an entry from a feed like this is aggregated into another feed, like what happens at feedster - should the aggregator then be responsible for resolving the relative links in the various elements and content? What about any (yet) unknown extension elements, how does one know when its contents contains a link?

Feed explorations

2003-07-30T09:54:23-04:00

Dare: suggestions? It would seem to me that any other alternative would require users to write their own implementation too, so why not go with the standard?

Morton: I believe that such dates are valid according to ISO8601 and XML Schema Datatypes. What these dates don't do is conform to a profile submitted to the W3C by Reuters Limited. Do you have any other suggestions as to how LJ could represent the data that they have?

Note: my intent in surfacing these issues in my weblog is not to propose a specific solution, but to solicit the widest possible amount of feedback. The only thing I would like to note is that "here is a suggestion" is much more helpful feedback than "here is a problem".

Feed explorations

2003-07-30T09:58:11-04:00

Each entry can have it's own xml:base attribute, so it's easy to put items from different feeds into one feed.

Attributes and elements that are of type xs:anyURI are links. So extensions must have a schema. And the feed using the extensions should point to that a schema that includes both the default Atom schema and the extension schema.

So it would be complicated to automate that. But then there's hardly anything you can do automatically with unknown extensions.

Feed explorations

2003-07-30T10:33:45-04:00

Sjoerd is right; schemas tell you datatypes, data does not. Introspecting into data values to "determine" that an unknown element is a link is fraught with peril. [id], for example, is not meant to be a clickable link. Some people repeat the [link] value, others construct an http URI that would generate a 404 if followed, others use urn: or tag: values.

Feed explorations

2003-07-30T10:59:34-04:00

For reference, the wiki page for xml:base is RelativeLinks.

Dare, xml:base isn't a parser-level spec, it's an application level spec. So, yes, that does place the burden on all developers to support xml:base.

RE: Feed explorations

2003-07-30T13:16:48-04:00

Ken,
Considering the fact that Base URI is a property in the XML infoset, is affected by specs like XInclude, and is an accessor in the XQuery/XPath 2.0 data model I'd consider xml:base is a parser level spec. Saying xml:base is not a parser level spec is like saying XML namespaces is not a parser level spec.

Sjoerd,
The only way folks can discover the XSD types of elements and attributes is to use a XML parser that performs validation and type augmentation according to W3C XML Schema rules. I'm not sure how widespread parsers that provide access to the PSVI actually are in practice. I know the DOM in MSXML does this while the DOM in System.Xml does not although the XmlValidatingReader can be used to grab this info during the parse. In my opinion requiring a schema when processing a feed especially with the potential proliferation of extension elements and attributes is unwise.

Sam,
I'm just describing the current state of the predominant XML parsers on Microsoft platforms. If you guys decide to go with xml:base and I eventually get around to supporting your format I'll have to hack something up.

Feed explorations

2003-07-30T13:33:45-04:00

Re: namespaces.

I thought you had to give an xmlns:e="???" so that the XML parser would know what you were talking about when you used the "e" namespace prefix.

Is that wrong? If so, how do I tell the difference between your "e" and someone elses?

Re: xml:base

Would it be reasonable to provide an option in an XML parser such that the parser would automatically try to resolve relative URIs using xml:base?

Feed explorations

2003-07-30T13:50:45-04:00

Dare, why so confrontational? Simply suggest something else that addresses how to resolve relative URIs in a manner that is more conducive to the predominant parsers that are in existence, and we will explore that.

Mark, Simon's feed contains xmlns:e="http://example.com/newformat#". View source to see it.

RE: Feed explorations

2003-07-30T13:51:52-04:00

Mark,
For XML namespaces and xml:base to work you have to have an XML parser that supports both features. My comment was to note that although the primary XML parsers on Microsoft platforms support the former but not the latter.

Feed explorations

2003-07-30T14:07:44-04:00

Perhaps a more direct question will elicit a more productive answer: how do you handle relative links in HTML content with RSS feeds now?

RE: Feed explorations

2003-07-30T14:12:33-04:00

Sam,
I don't see where I'm being confrontational unless you consider me not providing suggestions as seeking confrontation. Before I can offer a suggestion the question is where you expect to handle relative links? Relative links in content or relative links in permalink URLs?

The former requires digging into the content to pull out the link either way but I don't see why it can't use the location of the feed, blog URL, or the permalink URL as the base URI. I use the URL in RSS Bandit and haven't had any complaints so far. Since the latter will be generated by tools I see no reason why it can't be mandated that this is always an absolute URI.

Feed explorations

2003-07-30T14:14:43-04:00

Oops looks like my markup got eaten. That should say I use the link or guid[permalink="true"] URL.

Feed explorations

2003-07-30T14:27:00-04:00

Dare, Tim Bray's RSS feed currently has 9 relative references. Your RSS feed is on a different domain than your site. xml:base seems the most carefully thought out proposal to date which provides the ability to handle both, in that in the normal case, URLs should be relative to the feed, but in the exceptional case there is a relatively straightforward and standard attribute to look for. It seems to me that any other solution to this problem can't require any less work than a single attribute with a default that works for the overwhelming majority of cases.

If you don't want to support xml:base, or this format for that matter, that's fine. But phrases like "if... eventually... hack" don't serve to further productive discourse.

Evan Martin

2003-07-30T14:36:23-04:00

Brad and I are going to California for another meeting, again about the new indeterminately-named weblog syndication/storage format/API/kitchen sink. I’ve been working on LiveJournal support for it.We may get to meet jwz. Thinking back, I’m pretty...

RE: Feed explorations

2003-07-30T14:52:24-04:00

Sam,
You have a wierd idea of what confrontation or productive discourse is. I stated

"If you guys decide to go with xml:base and I eventually get around to supporting your format I'll have to hack something up."

which you turn into

"if...eventually...hack"

IF: I didn't realize that going with xml:base was a done deal and I assume you guys were open to comments. My apologies.

EVENTUALLY: I'm sorry I'm busy and can't make concrete commitments as to whether I'll support an XML format hasn't even been written yet. One of the biggest problems with standardization is when people agree to support standards sight unseen and has led to several messes in the XML world. Of course, this is old news given that James Gosling was complaining about the same thing over a decade ago in http://java.sun.com/people/jag/StandardsPhases/

HACK: xml:base is a parser level feature. In fact, both the XmlReader and the XmlNode classes in the System.Xml namespace expose a BaseURI property. Short of implementing my own XML parser (which would be the proper way to do it) I will have to hack something in by tracking the currently in scope xml:base attribute in the various bits of code that manipulate feed entries. Given the amount of places in the my code where I deal with feed entries from passing them to XSLT stylesheets for theming to IBlogExtension to just parsing them for the first time around. This of course, gets even hairier if the content can have nested xml:base within it. In which case, the best bet does seem to be to subclass the XmlTextReader and add xml:base support myself.

Since I don't plan to subclass the XmlTextReader class anytime soon I was planning to "hack in" support. My apologies if the term offends you.

Feed explorations

2003-07-30T15:12:22-04:00

Dare: it was the second half of the "AND" clause that I was specifically referring to. The more feedback that can be obtained by people prototyping early, the better the standard will be.

I agree that ideally xml:base would be a parser level feature. Presumably, it will eventually be in all major parsers. Until then, it is a well documented attribute in a standard namespace that can be queried by the application. With a simpler set of heuristics than the one you described.

Just to be sure, I tried it myself.

What am I missing?

RE: Feed explorations

2003-07-30T15:27:13-04:00

Sam,
The example you showed is fairly trivial. If I am parsing a feed using the XmlReader I need to have a stack of xml:base attributes seen so I know what the current one in scope is. If I'm using the DOM then I need to ensure that I run a query such as (ancestor-or-self::node()/@xml:base)[1] on the node I am interested in and if nothing comes up then use the BaseURI property of the node.

Of course these are the simple cases where you are parsing stuff at the level of feeds or entries. The problems become much hairier if there could be arbitrary xml:base attributes in XHTML content and in such a case I'd rather have xml:base support in the parser than try to track it myself.

Formerly Echo

2003-07-30T16:37:00-04:00

All LiveJournal test users now have Atom feeds. Here is an example. Simon Fell's Atom feed demonstrates explicitly marking the namespace on every item. More interestingly, he makes use of xml:base to have relative urls in links, ids, and even......

Feed explorations

2003-07-30T17:36:17-04:00

Sam (et al),

You are of course right, is was too hasty declaring the dates invalid, a better word would have been useless, even if a little strong. The point is, that the W3C profile is there for a reason, to make it possible to define an instant in time precisely. Otherwise the time part doesn't have any meaning and should be left out - which is what I propose they do, if the correct time zone can't be found. If this is not valid according to either specification, something should change.

Regarding xml:base, I realize that the elements or attributes could be defined in the schema as URIs, but that hardly seems enough - should all parsers be validating parsers?

If there was a vote regarding relative links, I'd say no - or suggest picking a framework that can tell the difference between URIrefs and text... :)

And yes, I agree on your point regarding problems and solutions, but I also feel it's important to point out and discuss the problems before advising solutions, otherwise they may end up half-baked.

Gordon Weakliem's Weblog

2003-07-30T19:55:56-04:00

A couple updates to yesterday's news: The query string was broken in the example I posted, it's been fixed. My sample feed nearly validates, except for the lack of dates. I tried an experiment using SSI to generate my stylesheet, but I can't ...

Feed explorations

2003-07-31T03:13:13-04:00

Dare Obasanjo: What alternative to xml:base does the parsers/other tools you work with have support for?

If none, what do you suppose is a better solution than using xml:base?

Finally, a bit of an OT: why not subclass it? Tracing nested xml:base should be easy as pie with a stack-type of que. Right? Or am I missing something?

RE: Feed explorations

2003-07-31T03:57:30-04:00

Tomas,
Based on your comments and those of others (e.g. http://radio.weblogs.com/0106046/2003/07/30.html#a301) it seems some of you folks seem to think that I am asking you not to use xml:base. I could care less, since whatever option is picked I could probably implement as far it isn't infeasible.

All I'm doing is pointing out that it shafts the average Microsoft developer that wants to parse Pie/Echo/Necho/Atom feeds if they have to hack up their own implementation of xml:base as Greg Reineckar did (http://www.rassoc.com/gregr/weblog/archive.aspx?post=630#comments).

I'm not here to propose alternatives or brainstorm solutions since it isn't a big deal to me either way especially when people throw backhanded flames my way for attempting to contribute by pointing out the reality of the situation for XML developers on Microsoft platforms.

The FuzzyBlog!

2003-07-31T10:36:14-04:00

Feedsterlicious ! Heh. If you track our Feedster Stats page at all (and yeah its slow to display and I know I need to make a cached version) then you'll see something interesting: 160,000 + feeds now. That's right. Last night we...

Feed explorations

2003-07-31T10:47:18-04:00

Morten, having talked directly to one of the authors of the XML Schema specifications, I can assure you that the inclusion of dates without timezones was an equally intentional and difficult decision. The essential issue is one of whether you cling to ideals and exclude some important real world usage, or if your goal is to model the real world as it is knowing that this makes things more difficult downstream. Not an easy choice.

P.S. regarding your comment on mixing xml:base from multiple sources, there actually is a related, and frankly much bigger issue: mixing of character encodings from multiple sources.

Feed explorations

2003-07-31T12:08:47-04:00

re: "I'm not here to propose alternatives or brainstorm solutions." Well, the information about the state of Microsoft tools was very useful. Everything else, less so.

The fact is that RSS does not specify how to handle relative links. You do it one way, the next guy does it some another way, and you're both guaranteed to be wrong at least part of the time because there's no way to handle all cases properly (as Sam pointed out, your feed and Tim's feed are two mutually exclusive cases).

Atom will specify a way to handle relative links. It will be unambiguous. It will handle all cases. We are currently discussing how to specify it. This will require you writing new and different code than you have already written, because the code you have already written is necessarily broken. This is in no way your fault; you're doing the best you can with the data you've been given.

Atom will give you more data, so that you at least have the chance to write code that correctly handles all cases. Whether you choose to do so is, of course, entirely up to you.

The FuzzyStuff: aaaaFeedster

2003-07-31T14:40:40-04:00

Feed explorations

2003-07-31T20:18:48-04:00

1 On flaming: Is it really necessory Sam, Dare, Mark that we are going to get another childish negative discussion in between the more serious stuff. Come on be professionals.

2.On xml:base and parser compatibility: For a project which is just underway and has in terms of vocabulary not produced much more then we already have with RSS2 it seems rather presumptious to brush aways the mainstream Microsoft parsers. There will not be a near future for Atom if the atom feeds can't be properly parsed by msxml and net.
As it stands now it is fairly easy to transform all three vocabularies in one go, see http://cybarber.ath.cx/dummy1.xml but with xml:base being used, such transformations will be a no go for several years unless there is sufficient influence at MS to have their parsers updated.

Cybarber

Feed explorations

2003-08-01T01:52:34-04:00

In all fairness, implementing xml:base functionality with .NET's XML parsers isn't hard...and it's certainly no worse than any other relative link mechanism I've seen.

I'm voting for xml:base - pre-existing art, and it seems to fit the bill for what we need.

Feed explorations

2003-08-01T10:52:36-04:00

Cybarber: atom feeds can be properly parsed by msxml and net.

Greg hits the nail on the head "implementing xml:base functionality with .NET's XML parsers isn't hard...and it's certainly no worse than any other relative link mechanism I've seen.".

Feed explorations

2003-08-01T11:27:12-04:00

For reference, expat, SAX, DOM Levels 1 and 2, and XSLT all do not support XML Base "in the parser." Which is to say, most of the world's parsers do not have native support for XML Base.

The deployment of XML Base is through normative reference by new specifications, for example XLink and the XML Infoset. Applications and specifications built upon these new technologies will natively support XML Base. The behavior of xml:base attributes in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined. -- XML Base

As a new specification, if Atom supports XML Base it will be normative and its behavior defined -- at the application level.

Given that, could someone more precisely state what the issue is? Is it "don't allow relative links" or "don't use XML Base to implement relative links"?

Feed explorations

2003-08-01T11:58:21-04:00

Ken,
I'm not sure either of these has been raised as an issue. The discussion seems to have circled around the fact that I pointed out that Microsoft's XML parsers don't natively support xml:base but as you point out that is a myopic view. The truth is in general most XML APIs and XML technologies (like XSLT) do not have support for xml:base.

This means xml:base typically has to be implemented at the application level. In this case, the wisest thing for people parsing feeds that contain xml:base to do is to do a first parse to normalize all the links in the feed to absolute URIs since if there is an XML pipeline or fragments of the feed are reused in other contexts (e.g. styling with XSLT or aggregation of multiple feeds) then this information is lost. If not, this means every layer of the pipeline or everywhere the feed fragments are reused has to have some xml:base logic which IMHO is onerous.

As to whether relative links should be allowed, I can understand saying that relative links should be supported in content since folks often type relative links in by hand when posting but don't see any reason why they should be supported in permalink URLs or within extension elements/attributes since these are likely to be autogenerated so there isn't any reason why the full URI couldn't just be created.

Feed explorations

2003-08-01T12:07:24-04:00

re: "but don't see any reason why they should be supported in permalink URLs"

+1. That's just weird.

Feed explorations

2003-08-01T13:34:09-04:00

Agree with most of Dare's analysis as to difficulty (and no real need for relatives in permalinks). But I think that relative URIs in content are a big win, and I used to think that it was obvious what the right base URI ought to be; but then people showed me corner cases where the feed and the entries have different bases, and then an entry's quoting a paragraph from somewhere else, yecch.

So it seems like xml:base provides a really clean solution that leaves no ambiguity. It might be worthwhile saying "in the absence of xml:base, the base URI of anything in a feed is the URI of the feed." Which would work in some cases.

If enough implementors put up their hands and say "that's too much work, we're not going to do it" the best fallback position is probably "No relative URIs".

Feed explorations

2003-08-01T16:30:01-04:00

There are several ways to simplify xml:base:

relative urls are only allowed in unescaped content
if the content contains relative links, xml:base is required
xml:base attributes are only allowed on content elements, and therefore have to appear on each content element that contains relative links
xml:base values can be restricted: no fragment IDs, no query-strings, maybe even require it to end in a "/"

Stefano Demiliani WeBlog

2004-02-13T04:11:58-05:00

Google supports Atom feeds......

对牛乱弹琴 | Playin' with IT

2004-05-06T11:15:29-04:00

感谢Antonio Cavedoni提供的一个Atom 0.3 to RSS 1.0的在线转换工具，它能将Google/Blogger和LiveJournal所采用的Atom 0.3格式的feeds，转换成RSS 1.0规格的feeds，这样你就不必非得换一个支持Atom的RSS reader，也能订阅Blogger的blogs。另一个好处是，通过这种在线转换，你实际上也绕过了国内对Blogspot的封锁。...