It’s just data

State of Autodiscovery

I was curious about the uptake of autodiscovery among the Feedster top500 weblogs.  The good news is that about 80% have autodiscovery links.  But looking deeper, I found some surprising results (all reports other than the last one rounded to the nearest 5%):

I’m not yet certain what this all means.


Hmm. I’m not certain about the rest of it, but I have my suspicions about a now-404ed site hosted on Blog*Spot, with what appears to be a person’s name associated with it, a name which only appears in Google associated with multi-hyphenated-keyword-stuffed-domain-names.

I tried to look for a random example Blog*Spot site the other day by hitting the “next blog” link in their standard banner, but after twelve straight spam blogs I gave up.

Posted by Phil Ringnalda at

Phil: take a look at the navbar now... spam? flag it.

(well, for blogs that have been (re)published)

Posted by Greg Stein at

Out-freakin'-standing! I actually thought about that, but then I figured the marketing issues issues would be too big a bar.

Posted by Phil Ringnalda at

I think it means Scott Johnson and team worked really hard to get this working :)

Posted by Randy Charles Morin at

Sam Ruby: State of Autodiscovery

autodiscovery の現状。結構意外な結果...

Excerpt from del.icio.us/tag/atom at

Well this is interesting

It looks like Blogger’s finally starting to get more active in combating BlogSpot spam blogs. Look what’s just appeared on the navbar: The “What does this mean?” link leads to a Blogger help page explaining the Flag...

Excerpt from James Kew: Resident Alien at

How I do I know which feed they are tracking that is not listed?

Posted by Anne at

On a side note: How many of those 500 feeds validate?

Posted by Breyten at

Randy Charles Morin:

I think it means Scott Johnson and team worked really hard to get this working

One way that they did exactly that is that they seem to consider http://joelonsoftware.com/rss.xml and http://www.joelonsoftware.com/rss.xml to be the same.  I don’t (yet), which explains some of the disparity.  I’m working to refine this.

Anne:

How I do I know which feed they are tracking that is not listed?

I used the OPML file.  Feedster also links to each feed on this page with the text “rss” (further contributing to the dillution of the term).

Breyten:

On a side note: How many of those 500 feeds validate?

A interesting question — for another day, perhaps.

Posted by Sam Ruby at

If you took it from there you have a bug in your results. The feed that is listed on the Feedster Top 500 is the same feed I link from my home page. It is the first feed I link to, even. (Still Atom 0.3 though. I need more time.)

Posted by Anne at

Sam Ruby was smart enough to look at the State of Autodiscovery in the Feedster Top 500:20% do not have autodiscovery 30% have autodiscovery, and top500 is tracking the preferred feed 10% have multiple autodiscovery links, but the top500 is tracking...

Excerpt from Kevin Burton's Feed Blog at

Unless you are talking about “this instance of the list, which I cached at this time” it has a bit of a Heisenberg problem: several of the most obvious things, like listing Jenny as “YACCS Comments for The Shifted Librarian,” that I noticed yesterday, are fixed now.

My first guess, that their choice of feed was “the largest feed with a channel/link or feed/link@rel="alternate” pointing to the site" (to choose a full-content feed) doesn’t work terribly well with Pepy’s Diary, which only lists full.rdf in autodiscovery, though the list chose brief.rdf.

One obvious conclusion, from this and from the way Yahoo tried (or maybe is still trying) to supplement blo.gs pings by listing as updated any site where they saw an update to a feed that claims to be of that site, is that partial feeds like comments or linklogs shouldn’t link to the site page, even if you don’t really want to create a separate HTML page for them. I’ve never really wanted an HTML page of context-free comments, but I guess I need one, to let me have a different channel/link.

Posted by Phil Ringnalda at

I’ve never really wanted an HTML page of context-free comments, but I guess I need one, to let me have a different channel/link.

I have always wanted that on your site.  It’s the easiest way to track conversations on your site, where content (and ensuing discussion) appears in spurts, separated by long stretches of spam and nothingness.

Posted by Mark at

Those who are interested can track today’s progress here.

I am getting the latest OPML each time.  I tend to not think of it as uncertainty, but rather as a virtuous feedback loop.  My focus is on increasing the signal to noise ratio.  Example: with simple pattern matches, I can likely identify when the difference is only the format, not the content.

I do think that Kevin’s conclusion is very premature.  For example, the problem with Fleshbot and Gizmodo is that their autodiscovery links are incorrect.

Posted by Sam Ruby at

And the problem with Joi is that his /jp/index.xml feed lies, and says that its channel/link is joi.ito.com rather than joi.ito.com/jp/. Always something new to worry about, in bits of syntax you never paid much attention.

The feedburner.com vs. redirected URI problem is getting ready to bite Firefox’s butt: adding a live bookmark is going to be done (is done, in nightlies) from the content of an XSL transform of the displayed feed, so we’ll discover me.com/index.xml, load it, follow the temporary redirect to feedburner.com/Mecom, and then save that as the feed URI, rather than the correct originally discovered URI. I should probably re-find the bug for that, and take a stab at fixing it, since it’s just the sort of thing that nobody will care about until it’s far too late.

Posted by Phil Ringnalda at

Phil,

I think you’re commenting on the exact problem I’m having with autodiscovery: what is the correct “definitive” link to a feed? Is it the HTML page that contains the <link> elements or is it the URI pointed to by the <link> element?

In my aggregator, should I store the original HTML page and periodically sample it to see if the <link> element is still “valid” or “preferred”. Or should I only monitor elements within the feed I download to see if the URL I use still matches? And which elements?

What I want is a feed that corrects itself during changes instigated by the producer. With minimal interaction to the user. “minimal” being defined ideally as “none”, possibly as “yes/no” dialogs, and at most as “choose the feed” multiple-choice dialog.

How do I do this today? In the future?

Greg Smith ;
Author, FeederReader - Pocket PC direct RSS text, audio, video, podcasts ;
www.FeederReader.com - Download on the Road

Posted by Greg Smith at

Gees, talk about getting caught with yer pants down...
I’ll be fixing Joi’s templates shortly. ;)

Posted by Boris Anthony at

ongoing · Atom 1.0

ongoing · Atom 1.0. Time for me to support Atom I’ll do that this weekend, along with autodiscovery....

Excerpt from Keith's Weblog at

Add your comment