intertwingly

It’s just data

Widening the Net


The excerpting function seems to be working, and by now I guess that all the people I could have encouraged to add RSS AutoDiscovery information to their websites have done so.

So, now it is time to widen the net a bit. No, I am not going to include the Ultra-liberal RSS locator because I feel that it would be morally wrong to do so. Scouring several (possibly dozens) of sites for information after a human enters text into an entry field is one thing, but doing so automatically once an hour for each referrer is another.

So here is what I have implemented so far. If I retrieve a page and it has no appropriate link tag, then I will scan for <a> tags with hrefs that point to the same site and end with a file name that is commonly used for rss. The ones I have come up with so far are: rss.xml, index.xml, index.rdf, and ?flav=rss. The first one I encounter will be used - so there will only be one attempt to fetch an RSS feed per site per hour.

If you know of another common convention, leave a comment. If your site doesn't follow a common convention, consider adding a <link> tag to your site.