I was curious about the uptake of
autodiscovery among the
weblogs. The good news is that about 80% have autodiscovery
links. But looking deeper, I found some surprising results
(all reports other than the last one rounded to the nearest
Hmm. I’m not certain about the rest of it, but I have my suspicions about a now-404ed site hosted on Blog*Spot, with what appears to be a person’s name associated with it, a name which only appears in Google associated with multi-hyphenated-keyword-stuffed-domain-names.
I tried to look for a random example Blog*Spot site the other day by hitting the “next blog” link in their standard banner, but after twelve straight spam blogs I gave up.
It looks like Blogger’s finally starting to get more active in combating BlogSpot spam blogs. Look what’s just appeared on the navbar: The “What does this mean?” link leads to a Blogger help page explaining the Flag...
I think it means Scott Johnson and team worked really hard to get this working
One way that they did exactly that is that they seem to consider http://joelonsoftware.com/rss.xml and http://www.joelonsoftware.com/rss.xml to be the same. I don’t (yet), which explains some of the disparity. I’m working to refine this.
If you took it from there you have a bug in your results. The feed that is listed on the Feedster Top 500 is the same feed I link from my home page. It is the first feed I link to, even. (Still Atom 0.3 though. I need more time.)
Sam Ruby was smart enough to look at the State of Autodiscovery in the Feedster Top 500:20% do not have autodiscovery 30% have autodiscovery, and top500 is tracking the preferred feed 10% have multiple autodiscovery links, but the top500 is tracking...
Unless you are talking about “this instance of the list, which I cached at this time” it has a bit of a Heisenberg problem: several of the most obvious things, like listing Jenny as “YACCS Comments for The Shifted Librarian,” that I noticed yesterday, are fixed now.
My first guess, that their choice of feed was “the largest feed with a channel/link or feed/link@rel="alternate” pointing to the site" (to choose a full-content feed) doesn’t work terribly well with Pepy’s Diary, which only lists full.rdf in autodiscovery, though the list chose brief.rdf.
One obvious conclusion, from this and from the way Yahoo tried (or maybe is still trying) to supplement blo.gs pings by listing as updated any site where they saw an update to a feed that claims to be of that site, is that partial feeds like comments or linklogs shouldn’t link to the site page, even if you don’t really want to create a separate HTML page for them. I’ve never really wanted an HTML page of context-free comments, but I guess I need one, to let me have a different channel/link.
I’ve never really wanted an HTML page of context-free comments, but I guess I need one, to let me have a different channel/link.
I have always wanted that on your site. It’s the easiest way to track conversations on your site, where content (and ensuing discussion) appears in spurts, separated by long stretches of spam and nothingness.
Those who are interested can track today’s progress here.
I am getting the latest OPML each time. I tend to not think of it as uncertainty, but rather as a virtuous feedback loop. My focus is on increasing the signal to noise ratio. Example: with simple pattern matches, I can likely identify when the difference is only the format, not the content.
I do think that Kevin’s conclusion is very premature. For example, the problem with Fleshbot and Gizmodo is that their autodiscovery links are incorrect.
And the problem with Joi is that his /jp/index.xml feed lies, and says that its channel/link is joi.ito.com rather than joi.ito.com/jp/. Always something new to worry about, in bits of syntax you never paid much attention.
The feedburner.com vs. redirected URI problem is getting ready to bite Firefox’s butt: adding a live bookmark is going to be done (is done, in nightlies) from the content of an XSL transform of the displayed feed, so we’ll discover me.com/index.xml, load it, follow the temporary redirect to feedburner.com/Mecom, and then save that as the feed URI, rather than the correct originally discovered URI. I should probably re-find the bug for that, and take a stab at fixing it, since it’s just the sort of thing that nobody will care about until it’s far too late.
I think you’re commenting on the exact problem I’m having with autodiscovery: what is the correct “definitive” link to a feed? Is it the HTML page that contains the <link> elements or is it the URI pointed to by the <link> element?
In my aggregator, should I store the original HTML page and periodically sample it to see if the <link> element is still “valid” or “preferred”. Or should I only monitor elements within the feed I download to see if the URL I use still matches? And which elements?
What I want is a feed that corrects itself during changes instigated by the producer. With minimal interaction to the user. “minimal” being defined ideally as “none”, possibly as “yes/no” dialogs, and at most as “choose the feed” multiple-choice dialog.
How do I do this today? In the future?
Greg Smith ;
Author, FeederReader - Pocket PC direct RSS text, audio, video, podcasts ;
www.FeederReader.com - Download on the Road