Now you see it...
For the past couple of nights, I noticed a strange thing. I’d go to bed with my entries showing up on Planet Intertwingly. I’d wake up with a number of them missing.
After each run, I was overwriting the log of the previous run, but last night I made a change so each was given a unique file name. This morning, I awoke to find:
DEBUG:planet:Removed expired or replaced item <tag:intertwingly.net,2004:2248> DEBUG:planet:Removed expired or replaced item <tag:intertwingly.net,2004:2247>
Looking at the code for Planet, it seems that there is some logic to remove entries from the cache that have been dropped from the feed. So if, say, on one request entries were returned for Monday, Wednesday, and Friday; and on the next request, only Monday and Friday are returned, the code makes the assumption that Wednesday was intentionally pulled, so it hides the entry.
Nothing in any spec says that this is the way servers are expected to operate. This is entirely an assumption. One that, at first blush, seems reasonable.
The problem comes in that both my weblog and the Feed Parser support what Bob Wyman refers to as RFC3229 with “feed”.
In a nutshell, based on the ETag that the client sends, the server can determine which entries the client has already seen, and can save bandwidth and perhaps even some client processing if only those entries that have changed are sent back.
See the problem?
What happened in this instance is that comments were received on some entries, which caused those entries to be updated in the Atom feed to reflect the current number of comments present. As the others weren’t changed, they weren’t sent back.
What was sent back, however, was a different status code, namely 226. Based on this code, the client should know that this response only contains a partial update, i.e., the client should not attempt to read anything into the lack of repetition of unchanged entries.
I’m testing a fix now.
For that matter, I’ve been meaning to deal with gzip encoding as well, but haven’t gotten around to it. I played around with manually sending some requests to Intertwingly’s feed with Accept-encoding: gzip, but I’m not really sure what the best and fastest way of decompressing that content is within Ruby.
Posted by Bob Aman at
Pond Envy
My first recommendation aligns with Aristotle’s: diversify. I didn’t do any Perl in the past week, but I did do Python, PHP, Ruby, and JavaScript. In pond size terms, PHP is huge and growing. Ruby is nascent but exploding. My second recommend... [more]Trackback from Sam Ruby at
Ahr, nice catch! That’s an assumption from way back, before feedparser supported the partial feeds. Rock on. :-)
Posted by Jeff Waugh at
[from azz] Sam Ruby: Now you see it...
RFC3229+feed: an utterly stupid idea that’s broken various bits of feedparser-based software. Here’s how it broke Planet....Excerpt from del.icio.us/network/bcc at
Huh, interesting. Honestly, I personally don’t think I’d want that behavior even if “RFC3229+feed” wasn’t an issue.
Actually though, I’ve been meaning to implement “RFC3229+feed” for awhile now, but just haven’t gotten around to it. Partly because no one’s requested it yet.
Posted by Bob Aman at