Rogers
Cadenhead: If you're going to point a dead site's feed to
nowhere, why not simply delete it? Any decent aggregator will
eventually dump a feed that results in "file not found"
errors
If your server supports it, a
410 status code means gone. Whereas a
404 status code may be transient, a 410 is clearly intentional,
so you would think that any decent aggregator would
respect a 410, wouldn't you?
Well, last year, I created a feed for Esther Dyson's
Release 4.0. She
since has moved to another server and produces a
mostly valid RSS 2.0 feed (the problem is the
27 differences between iso-8859-1 and windows-1252).
I've long since removed my scraped feed, and
marked it gone.
I've seen Dave Winer pointing to another way to express a feed's death, using XML just in case a weblogger has not control over HTTP responses. What about that idea? As far as it is known, only Radio and NetNewsWire support that.
The problem with Bloglines is that we weren't catching the 410 error. That's been fixed and these feeds should be removed from the database as per our normal policy, which is after 14 consecutive days receiving the error. Once we receive this type of error on a given day, we stop polling that feed for the rest of the day. So, the feed should only be polled 14 more times for each instance in the database. The 14 day number is what we also use for permanently re-writing redirects in the database. We do this because we've seen numerous instances of temporary server misconfigurations.
Mark, shouldn't that policy only apply to 404 errors? Since 410 means gone and when you got that error back you (almost) certainly know it was removed and will never return.
Anne, I'm reluctant to make that change because of all the misconfigurations we've seen. Deleting a feed is about as permanent a change as you can make, and I want to make sure that we don't delete feeds based on an error. That said, I'd be curious to hear what other people think about this policy.
1) I'm making a change to BottomFeeder to mark feeds reporting a 410 as bad, and leaving it up to the user as to whether it should be deleted. There may be old posts being saved, so deleting it automatically seems like the wrong thing
2) IMHO, having a 410 reported is the wrong result. A 301 (permanent move) would have been a far better choice for this. Aggregators that support that (BottomFeeder being one) would follow it and update the feed silently. A 410 slams the door shut in this case, and - since there's a new location - is demonstrably the wrong answer.
410 is not an error code. It's a status code. You have to make an effort to get a server to serve it. The same for a 301: Hence, they should have effect immediately. 404is meant to be temporary, but 410 is not. 410 is dead, gone, not here no more and never will be.
RANT:
It always amazes me how the developers who talk so much about the sanctity of their own specifications treat those that they base theirs on as optional fripperies. We get arguments about correct mime types, about character encoding in XML, about repeated elements in rss1, about GUIDs and about namespaces, all the while ignoring that the higher primal requirements of HTTP, XML, RDF, logic and XML again not only answer these questions, but mandate an answer.
With respect to any syndication format served over HTTP, the status codes and the conditional-gets provided by HTTP1.1 are the only way to go. Not through some form of fashionable thinking, but because you're serving over http. If your application can't deal with these codes, or if it deals with them in a way that differs from the standard, then your application is wrong. It's just as wrong as if it treated a description element as a link, or tried to make a dc:creator tag blink.
For a feed author, 410 and 301 are really the only way to go if you have any form of control over the server at all: that there are major http based applications that don't conform to the http standard with respect to these status codes is just out of order.
Sure, there may be edge cases were the user has no control of the server, but in those cases it is up to the author of the application they are using to provide the ability to instate the 410 or 301 and use etags and provide enough information for conditional gets. The weblog author probably doesn't have any clue what this conversation is about, but the developer of the application that author is using should. In fact, if he's releasing a major publishing tool into the wild and controlling every aspect of the serving process he has an obligation to be up to spec. This isn't some two week old point release of RSS we're talking about here, after all: it's HTTP.
From the spec: "Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval."
Are continued retrieval and feed archiving synonymous? I could see saving them in a "dead feeds" area and letting the user check them manually, if they decided to.
How about a simple module to express this? It could be called XFR/D (XML-based File Redirection/Deprecation). This could be used for any XML based file, not just limited to syndicated feeds.
For instance, you could have <xfrd:status status="deprecated" />, meaning in no way, shape, or form would this file be updated again. Or you could have <xfrd:status status="relocated" type="application/rss+xml" href="http://www.site.com/rss.xml" title="New Feed" />, where type specifies the MIME type of the new file, href represents the new location of the file (where to redirect the aggregator to), and title is an optional element, naming the new file. Any thoughts on this?
Ben Hammersley, a reporter for The Guardian, objects to the idea of XML-based redirects. I've heard this before of course, but the fact is, a lot of people can't change how their HTTP server works. Still it's important that they be able to tell...
"It always amazes me how the developers who talk so much about the sanctity of their own specifications treat those that they base theirs on as optional fripperies. We get arguments about correct mime types, about character encoding in XML, about repeated elements in rss1, about GUIDs and about namespaces, all the while ignoring that the higher primal requirements of HTTP, XML, RDF, logic and XML again not only answer these questions, but mandate an answer...."
Excellent rant, start to finish, Ben.
(Even if you are a reporter for the Guardian smile and wink)
interesting how dave always seems to bag on ben, even when ben makes clear, rational points. ben, you are right on on this one. http should be somewhere near ground zero.
"a lot of people can't change how their HTTP server works"
What is this scripting language or RSS output application that can't set an HTTP header? I think we should be told. Alternatively, perhaps we could be told why the HTTP server would need a change to the way it works.
If you're FTPing plain text files to the server, you can either simply delete the old RSS file if it's gone or put a 5 line html file with a meta refresh line. So is that the problem? RSS readers can't cope with an HTML refresh?
How is HTML content with a "meta refresh" (which btw isn't standardized at all) relevant to a generic RSS client (that may not even know how to parse HTML)? And how will deleting a file through FTP instruct any given HTTP server to send a 410 instead of 404?
Would a Radio Userland user, a TypePad user, and a Blogger (or BlogSpot user) be able to get their server to return the correct HTTP code? I'm just guessing, but I'd bet that there are some cases for all three of these tools where an XML-level directive would be required, due to lack of user access to the server's HTTP return codes.
a concerned developer,
What you are pointing out are missing features or some would even say bugs in Radio Userland, TypePad and Blogger. HTTP has an established mechanism for indicating that a resource should be redirected or is gone. The fact that weblog hosting providers do not explicitly follow these rules does not mean we should reinvent HTTP (especially since sometimes it leads to interesting problems such as [link]) in every document format used on the Web. It means developers of aggregators and content management systems should live up to their responsibilities by paying attention to what the various specifications say.
What Ben said. The Radio/TypePad/Blogger user shouldn't have to worry about any codes - just tell the app their intention to move or close the feed. The app should deliver the appropriate messages. Sure, there will be situations where it's beyond the control of anyone publisher-side, but in most (if not all) of those situations there won't be the opportunity to send some XML either. So aggregators should be sensitive to 404's.
Adding another XML-level protocol gains nothing, just makes extra work.
This is probably a little harder to implement but anyway...Why not make aggregators aware of the average feed update frequency and tune up the scan frequency to that? This in general would save bandwidth. Setting a max to 1 month would also allow the re-birth of almost dead feeds. (More details on my weblog)
Why don't use the Syndication Module from RSS 1.0? An updateFrequency of 0, despite the updatePeriod, could mean that the feed is dead, and aggregators should not request it anymore.
The additional module/tag idea has one major logical flaw: for such a facility to exist in a typepad/radio/blogger style environment, it would require a typepad/radio/blogger developer to add it in. If they're going to even bother opening a code editor, they might as well fix their http implementation instead - it would be more compliant, save them money, and wouldn't require reader application developers to learn yet another specification. Or for that specification's syntax to be argued over for the rest of the summer.
CONTINUING THE RANT:
http1.1 is a lovely and venerable old girl, with recognised status codes, gzip, mime-types and etags. Half of the problems that RSS faces, from scalability to dead feeds to automatic subscribing in a desktop reader are covered by these - and half of the arguments about the spec are when this massive amount of prior art is ignored.
It's not hard. Treat RSS like any other proper document: serve it gzipped, with a mime type set to an application/x type affair and with etags set. Applications can then use standard http libraries (in every language near you today!) to query, ask for it if updated, unzip, and pass the feed to the necessary desktop application, just as if it was a jpeg, a pdf, an mpeg, a Real stream, or any other fully paid up member of the internet.
Hell, just agreeing on an RSS specific mime type would allow all the desktop apps to register it - Click on an XML button, and it will fire up your desktop app. Just like when I click on a .rm link, and it fires up RealPlayer - But instead what do we have? Serve it as text/xml so that the three hundred people who care can view source in MSIE (which is like serving QuickTime as text/plain so that the Apple developers can debug their video codec), and then have endless debates about user-education and the unchangable nature of the little orange button. There are even third-party web applications to help you subscribe to feeds with software on your own machine. All because ten years of prior art are being ignored.
You don't need distributed reading, or fancy processing instructions that themselves require namespace support (which is another story altogether), or a network of superhashed-megacrypto mirrors around the globe. You just have to remember that a syndication feed is an XML document served over HTTP. And then read the HTTP specification. If a scumbag journo like me can do it, I'm sure some elite developers can spare the time.
Well, you need a MIME type and you need agreement on an element or attribute in the feed that contains the URI for the feed itself, since when you click on a link to something with a registered MIME type, the handler only gets a local copy that the browser already downloaded, not the URI. Don't they teach you anything at that school where you give up your soul to the Guardian?
An element that contains the URI for the feed itself. Like rdf:about you mean? :-)
Ok, to prevent going down that road, let's just agree that adding in a single element to the root section of a feed to identify it is entirely trivial compared to tens of variants of
Where the thisFeed element contains the URI of the feed itself. Such feeds should be served with a mime type of application/rss+xml. Registered UAs should offer the user a chance to review and subscribe to a feed thus presented using the URI within the thisFeed element.
While this thread has gone off on an interesting tangent, a number of user agents have requested the non-existent feed over the past twelve hours. Notably, Bloglines has been all but eliminated, and a core RssBandit developer has indicated that he views the lack of support for the HTTP 410 status code as a bug to be fixed.
Independent of the relative merits of various alternative proposals, I have seen nothing which indicates that a client which continues to poll for updates without notifying the user after having recieved a HTTP 410 status response is buggy.
Dave, and similarly, HTTP 410 works with any format, by design.
Can we agree that while it may be reasonable for some clients to implement a fallback to what is supported in HTTP in order to accommodate "differently abled" servers, any client which attempts to implement HTTP but does not fully supporting HTTP status codes has a bug?
Dave: absolutely it does. I agree. It's simple and clear and human readable. It's just that I think that the feed itself is not the place for this sort of thing. Redirects can and should be relegated to the HTTP layer. Any effort spent in implementing that proposal, as clear and straightforward as it is, would be better spent implementing an HTTP status code response instead (a 301 in this case).
If not for general internet citizenship, then for the compelling reason that that Google respects them - and carries pagerank over to the new URL. Useful to know, that.
There is a beautiful, beautiful rant from Ben Hammersley well into the comments of something a bit more substantive than the usual Orange vs. Blue shenanigans. Ignore the specious aspersions being cast upon Ben’s professional integrity and...
Sorry - to expand my previous now the kettle isn't boiling - Phil's example of a relative URI within the rdf:about is not a problem, because (and here's a fine example of going back to the primal specifications) the RDF spec says that when you are presented with a relative URI you must be able to expand it out with either the URI of the document, or an expressed xml:base element.
As RSS 1.0 documents must be valid RDF documents, and as the publisher will know that the document's URI will not be passed on my the mime handler, then if he insists on using relative URIs then he must also use xml:base, or the feed would be invalid. Invalid feeds can't expect correct behaviour.
Shrook 1.x automatically stopped polling. Shrook 2 doesn't yet (though it does put an exclamation mark next to the feed that, when clicked, says the resource no longer exists). I'll try to improve the situation in the next version.
I would agree that in a perfect world all clients would use all features available in all protocols, but I also know that developers are busy, overworked and underappreciated people.
It's easier to persuade with a feather than with a hammer.
Yes, people who are busy, overworked, and underappreciated often write software with bugs. We have a number of aggregator developers who are in that camp; several of which have accepted this as a bug report, and in one case has already made a fix.
Oh, and thanks, for the bug report. I've corrected the spelling of Esther Dyson's name.
Over at Sam's the continuing discussion about what to do with feeds that are no longer being updated. Respect for the status codes of http is the obvious answer. (I don't see that anyone disagrees with that...) The answer put forth for those who...
It's easier to persuade with a feather than with a hammer.
There are only so many times you can hit someone over the head with a feather before you decide that perhaps a hammer would be a better tool for the job.
I don't understand how one can be too busy, overworked, and underappreciated to implement a correct use of the underlying transport spec of your application, but not too busy, overworked, and underappreciated to take the time to completely invent a new spec for the text that flows over that transport, and on top of that, not be too busy, overworked, and underappreciated to take the time to implement that new spec in your application.
Of course, people are also 410ing on that Kylie video too
As part of the Great Sorting My Shit Out month, I have, as you might know, been rebuilding this site, adding a lot of content, moving things, fixing links and replacing files. To help with this task, I wrote some......
[more]
HTML 4, from around the same time as HTTP 1.1, has recognized that many people do not have access to modify their server's HTTP responses and has http-equiv as an answer:
How would any application that published static files over FTP be able to implement 410 or 301? Blogger comes to mind, Radio, CityDesk. Even if the server is technically able to send these responses, like Apache, a user might not be allowed to use .htaccess. In PHP modifying headers only works if PHP is compiled as an apache module to send headers, and before PHP 4.3.0 there was no good cross-platform way to send response codes. I'm chasing a bug now where when we send the correct MIME headers it causes errors on seemingly random setups. This is PHP and Apache, not some proprietary app not implementing all of HTTP 1.1. It seems to me there are good reasons for a document-level redirect mechanism.
Or is it that anyone without the technical ability to execute this wouldn't care about it in the first place?
Ben: "The additional module/tag idea has one major logical flaw: for such a facility to exist in a typepad/radio/blogger style environment, it would require a typepad/radio/blogger developer to add it in."
That's not true, assuming the Typepad/Radio/Blogger style environment in question supports user-modifiable templates. For example, one of my users could implement Dave's solution without waiting on me to write any code.
OTOH, it just took me all of five minutes to add a "response" attribute to one of JournURL's template tags, meaning the same user can now return 410s and 301s just as easily as an XML-based redirect. So while I disagree with the reasoning, I guess I agree with the sentiment.
Jäger 1.2.4 for Windows is now available and supports: Feed redirection 410 status handling RSS 2 enclosures Feed redirection is a way of syndication feed providers (i.e. bloggers) of telling you that they've moved their feeds somewhere else –......
[more]
Blogger: no template for RSS/Atom. Radio: been a while, but I think you can only change it in code, not in a template. Typepad: no idea, though the way people talk about adding RSS 2.0 by upgrading to a higher level of service so they can add an arbitrary template makes me think not, at least not for the default feed.
http-equiv is an amusing parallel, since it was supposed to be server-side: "HTTP servers use this attribute to gather information for HTTP response message headers." From what I hear, no servers wanted to parse every outgoing HTML file for possible headers, so instead clients pretend that they got it as a header (with sometimes nasty results).
That's not true, assuming the Typepad/Radio/Blogger style environment in question supports user-modifiable templates.
Isn't the issue here partially that the user responsible for the feed may just up and leave, stop using the software in question, and thus not be able to click any magical buttons or add to a template to indicate their feed is dead?
I would imagine the people who would have to mark feeds as dead most often are server administrators. Which is easier for said sysadmins, adding a 410 redirect rule, or figuring out how to add a proper line of XML to a feed file?
Furthermore: Sam mentioned in the original post that despite Esther's feed long since being gone, he's still being polled for it by a large number of aggregators. If one of the goals here is to decrease the bandwidth consumed by hits for a dead feed, which makes more sense: returning a 410, or by adding another line to the feed file? And when older aggregators continue to poll the dead file due to a lack of understanding (Sam's list of aggregators hitting Esther's feed is frightening in its variety), won't it become more pronounced?
I concur with Danny: aggregators should be respectful of 404s, and observe 410s.
http-equiv is an amusing parallel, since it was supposed to be server-side: "HTTP servers use this attribute to gather information for HTTP response message headers." From what I hear, no servers wanted to parse every outgoing HTML file for possible headers, so instead clients pretend that they got it as a header (with sometimes nasty results).
+1
HTTP-EQUIV is an example of exactly why this is a bad idea. The HTML working group did not intend for clients to actually parse a document then pretend they got the information in a header (exactly how many HTTP headers are actually supported by the average client anyway?) however it turned out that this feature was infeasible to implement on the server for obvious reasons and we ended up with the hack where clients peek at the document to discover these psuedo-headers to sometimes unexpected results.
This hack is an unfortunate occurence in the WWW architecture not something that is worthy of emulation as an example of good design.
I wonder if we can use any of these 410 or 301 ideas to handle the weblogs that have suddenly been pulled from weblogs.com. I bet these webloggers could use something. Unfortunately, though, they don't have a place to put their RSS files.
I know, Shelley -- I find it truly fascinating that, instead of returning 404s or 410s or anything even remotely logical, when you request the RSS page from any of the old weblogs.com sites right now, Dave's returning the RSS file for his take-your-site-away-from-here stand-in page. How does that even approximate user-friendly?
Torsten beat me to it and added HTTP 410 support in RSS Bandit this morning (this afternoon for him) . The code is checked into CVS. For anyone that cares the bug history is at [link]
This situation of moved/gone RSS/Atom files seems similar to the situation with moved/gone HTML files:
if one has enough control over the server, set a correct 410/301 status
if one doesn't, fall back on the (default for missing file) 404 status
Since an automated redirect is always a client-side option, it is always good to have some help info for the site user about what is happening. One does have the option of putting a helpful-to-the-user message in the status page that is returned.
Especially where it's not possible to set 410/301 status, I think it would be an option to return an RSS/Atom formatted file that includes a post whose summary/description gives the user more info on what is going on.
It's true that automated agents presumably won't handle a "notice" file the same way they would a 404, but it does give the user some more clues as to how to handle it.
(P.S. with my iCite net project, I want to create a distributed catalog of old URLs and corresponding new ones, or confirmation that they are permanently gone. Regardless of whether I build this, in general, any third party could create a service like this. And, for example, feed readers could query this service when they come across 410s/301s/404s with no other info.)
The problem of many server implementations is that don't let users use the full potential of HTTP which is wrong. It would be easy to create an interface which helps the user to do the right thing.
A bit late into this discussion, and this is probably the wrong place, but here's a suggestion:
1) This is the metadata-in-data problem. It seems to show up in all formats (content sniffing, spam detectors adding their data to the subject line, html http-equiv.)
2) Since it's a demonstrated real problem with no likely server solution in the near future for most users, it has to be solved in the data. Or ignored.
3) That being the case, just use the HTML meta http-equiv style tag. Add some requirments: it must be a processing instruction. The processing instruction must be before the root node of the document. And the name should probably not just be meta, but a pseudo-namespaced name. Also there may need to be two items, a header line and also an HTTP status code.
Now instead of re-inventing the wheel, you've re-used a well known if somewhat problematic format.
Also, servers (if they want to do the right thing) and clients don't have to understand XML, just enough to read processing instructions (which are flat) and stop at the first tag they see.
A new version of Jäger 1.2.2.6 for Macintosh (Beta) is now available. This version is a fairly significant upgrade and has: better (but not perfect) support for Internationalized characters attachment/RSS 2 enclosure support, including a hack to directly launch......
[more]
Dave Winer : "All of a sudden I'm hosting hundreds of inactive RSS feeds, and since this base of sites was where RSS was bootstrapped, there are a fair number of subscribers. I need a way to tell the aggregators, forget it, these sites are in...
There is a beautiful, beautiful rant from Ben Hammersley well into the comments of something a bit more substantive than the usual Orange vs. Blue shenanigans. Ignore the specious aspersions being cast upon Ben’s professional integrity and...
SharpReader 0.9.5.0 is now available at sharpreader.net. Changes since the last version are: Bugfix: filter ending with "\" previously caused an exception to be thrown. Autocomplete in textboxes, implemented using LaMarvin Autocomplete Tool. Changed the threading model to fix Threadpool issues and hangups with very large number of feeds. Read/unread counts in subscriptions-pane in blue (like outlook). Read feeds with......
[more]
Dave Winer asks about automatic unsubscribing from a feed, and it's an interesting question. If a feed is no longer updated, how should the publisher tell aggregators to unsubscribe from it? I think the simplest solution is for the server to return...
From dave’s site. John Lennon said the Beatles were more popular than Jesus No argument, it was true, they were. Well, even though the vast majority of people have never heard of Steve or myself, we’re more influential than John Lennon or Bob Dylan...
Following up on my idea of using short-lived feeds to cover breaking news, as Sam Ruby points out, after a feed disappears, some aggregators don’t stop trying to download them. This wouldn’t cause a site that had only published a few short-lived...
410 Just over a year ago, I permanently redirected all my feeds their Atom 1.0 equivalents. Several months later, I quietly converted all those redirects to 410 Gone. Checking back to see how effective this has been, here is a list of...
[more]
HTTP Error 410: Gone : I found this page today when searching for a refresher on the 410 status code. It means “gone.” Forever. Not just “not found” right now, but forever more. Gone, baby. We should use status code 410...
HTTP Error 410: Gone : I found this page today when searching for a refresher on the 410 status code. It means “gone.” Forever. Not just “not found” right now, but forever more. Gone, baby. We should use status code 410...
I had just finished reading Sam Ruby’s “Gone, really I mean it” post, and decided to use “301 Permanently Moved” HTTP codes for my feed redirects. I noticed Planet Python in my access log around this time and went to look it up...
VirtualVitriol: Serendipity and the 301 HTTP status code
I had just finished reading Sam Ruby’s “Gone, really I mean it” post, and decided to use “301 Permanently Moved” HTTP codes for my feed redirects. I noticed Planet Python in my access log around this time and went to look it up and check that, even...