It’s just data

Now you see it...

For the past couple of nights, I noticed a strange thing.  I’d go to bed with my entries showing up on Planet Intertwingly.  I’d wake up with a number of them missing.

After each run, I was overwriting the log of the previous run, but last night I made a change so each was given a unique file name.  This morning, I awoke to find:

DEBUG:planet:Removed expired or replaced item <,2004:2248>
DEBUG:planet:Removed expired or replaced item <,2004:2247>

Looking at the code for Planet, it seems that there is some logic to remove entries from the cache that have been dropped from the feed.  So if, say, on one request entries were returned for Monday, Wednesday, and Friday; and on the next request, only Monday and Friday are returned, the code makes the assumption that Wednesday was intentionally pulled, so it hides the entry.

Nothing in any spec says that this is the way servers are expected to operate.  This is entirely an assumption.  One that, at first blush, seems reasonable.

The problem comes in that both my weblog and the Feed Parser support what Bob Wyman refers to as RFC3229 with “feed”.

In a nutshell, based on the ETag that the client sends, the server can determine which entries the client has already seen, and can save bandwidth and perhaps even some client processing if only those entries that have changed are sent back.

See the problem?

What happened in this instance is that comments were received on some entries, which caused those entries to be updated in the Atom feed to reflect the current number of comments present.  As the others weren’t changed, they weren’t sent back.

What was sent back, however, was a different status code, namely 226.  Based on this code, the client should know that this response only contains a partial update, i.e., the client should not attempt to read anything into the lack of repetition of unchanged entries.

I’m testing a fix now.