It’s just data

Poisoned Cache

For the past month, eight feeds hosted by blogs.sun.com were not updated on planet.intertwingly.net, a victim of a poisoned httplib2 cache.  A victim of a permanent redirect.  The evidence can be found here.

Misconfigured Wifi-providers make this problem common enough in desktop clients that most aggregators have dealt with this.  Presumably most of the larger online aggregators have countermeasures in place too.  Venus, when configured by default, does too.  It will only respect a permanent redirect if the redirect was to a feed which contained at least one entry.

When configured to support multiple threads, however, Venus will make use of httplib2 which has its own cache.  In this case, the problem is compounded by the fact that the maintenance CGI application that is the target of the redirect supports ETags enough to produce a 304 response.  Venus treats 304 responses as an indication of “success” after a period of server errors or time outs.

Eventually, such feeds would have been viewed as inactive for 90 days, but luckily in this case I caught the problem earlier.

For now, I’ve merely committed enough code to help diagnose the problem.  Feeds that all redirect to the same place, or to places that return “success” but no data will now be flagged with a message and typically, dash underlined in red, though the details are configurable with CSS.  But I really need to find a more permanent solution.


I’ve logged this as a bug:

  [link]

My first instinct is to create a user settable number of days that
a 301 has to be in place before it gets recorded in the cache, but I need
to think about it some more.

Posted by Joe at

Shortly after fetching, Venus knows the original URI used to fetch the request and whether or not the feed parser can find any entries in the response.  If it could cancel the permanent redirect taking place at that point (this could be as simple as deleting the cached response), I would have what I need.

Posted by Sam Ruby at

Thanks Sam. I have notified the blogs.sun.com folks about the erroneous HTTP 301 status code being returned by the blogs maintenance page.

Posted by Dave Johnson at

It’s a bit awkward, but you could construct an httplib2.FileCache object with the
directory you were using for the cache, and then call delete() with the given uri.

c = htttlib2.FileCache(".cache")
c.delete("http://example.org")


Posted by Joe at

[from hublicious] Sam Ruby: Poisoned Cache

clip: “Venus ... will only respect a permanent redirect if the redirect was to a feed which contained at least one entry.”...

Excerpt from del.icio.us/network/TomC at

Add your comment