It’s just data

Triage

Dare Obasanjo: Sam Ruby has asked whether the WCF RSS Toolkit supports ETags which is really a proxy for asking whether WCF supports manipulating HTTP headers directly. In my conversations with WCF folks like Yasser & Doug, the answer is that although the WCF RSS Toolkit doesn’t support ETags that this was due to time constraints than any limitations in WCF.

Close.  As Gordon would say allowing HTTP headers to be manipulated directly is an idea that could work if people would just follow policy.

No, the question is more of a proxy for asking if the WCF RSS Toolkit supports caching.  Its a scalability thing.  And typically exposes whether the underlying framework is a dumb router/transformer or is an active participant in the state transfer.

In your experience with syndication, do feeds tend to be polled relentlessly, whether or not they have changed?  Passing a request all the way to the application whether or not the data has changed may tend to limit the overall scalability of the application.  In servers such as Apache, 304s responses tend to be highly optimized.

Additionally, knowing which features a given team views as important, and which features don’t make the cut due to “time constraints” provides a lot of insight into which features the team view as crucial, and which they view as tickie marks, and this in turn provides a lot of insights into the priorities of the product team, doncha think?

As does the mapping of both names and email addresses to the same field.

As does the use of the term RSS generically.

Mind you, none of these indicators are conclusive, but taken together…

Meanwhile, people will take a look at these samples, and emulate them.  And build up an impression of WCF based on the aspects of WCF that these samples expose.

Eventually, they are bound to notice that “Our WCF [DataContract] doesn’t support attributes. That’s a deliberate choice”.  Pssst.  A little secret: RSS (any version) requires attributes.

I hope I’m wrong (I haven’t even downloaded the sample, so all this is based on heresay and conjecture), but all signs point to someone wanting to hitch a ride on the RSS Bandwagon, but ended up not producing anything that even MSDN would want to use.

Am I wrong?


I admit to being a bit curious about what “not producing anything that even MSDN would want to use” means. I’m not a MSFT employee, but I’m doing contract work for MSDN. I’m guessing that either

1) MSDN is doing something heinous with RSS.
2) You made an honest mistake and actually meant “MSFT”, not “MSDN”.
3) You were simply implying that MSDN is an eager internal adopter of all MSFT technologies. (This is almost completely true.)

I’m hoping it’s #2 or #3, but if it’s #1, I’d love to hear about it so I can use what little influence I have to get it fixed.

Posted by Craig Andera at

I’ve been thinking about some of this stuff in a broader context, lately.

I was going to write up an elaborate explanation, but decided to do it on my own weblog instead.

Posted by Manuzhai at

I was so busy moaning about lack of support for basic HTTP authentication that I completely missed the lack of support for ETags. Nice to see “a casualty of time/resources vs. demand” becoming a common refrain though.

Lame.

Posted by Charlie Wood at

Craig: definitely #3.  All other things being equal, I would expect MSDN to want to highlight Microsoft favored technologies.  Absolutely nothing wrong with that.  However, even with this slight “bump”, I would presume that MSDN has a higher responsibility for such things as response time and availability under high load conditions.

Manuzhai: good stuff.  One thing I would like to add is that ETags are deceptively simple.  If you design your system right so that there is a correspondence between what you expose externally and something that is versioned properly internally, it is a piece of cake.  However, what I expect a number of people to end up doing is producing suboptimal Atom feeds with this sample.

Posted by Sam Ruby at

Re Charlie Wood’s comment: the saddest part of that email is not that Microsoft apparently didn’t have the resources to build a feed processor supporting authenticated feeds, but that they did have the resources to build two feed processors that don’t.

Posted by Adam Fitzpatrick at

Charles & Adam,
  The WCF RSS toolkit is a code sample which shows how you would generate RSS/Atom feeds using the Windows Communication Foundation (aka Indigo). It is completely different from the Windows RSS platform which is a C++/COM library for consuming RSS/Atom feeds.

Posted by Dare Obasanjo at

Sam you said:
Passing a request all the way to the application whether or not the data has changed may tend to limit the overall scalability of the application.

I understand this but even if you do use Etags the request has to be passed to the application in order to validate the cached response (with if-none-match and so on). And the application (e.g. a PHP script) that created the response will probably has to regenerate the response and compute the new etag (if the etag is something like an MD5 sum of the response).

According to my understanding of the HTTP spec this trip to the origin server will not be the case only if the response contains some expiration information. So I don’t see the usefulness of Etags for boosting scalability if used alone with no Expires or other information that can make a user agent or other cache to preserve the cached resource without hitting the origin server.

Comments are welcome!

Posted by Stelios Sfakianakis at

I understand this but even if you do use Etags the request has to be passed to the application in order to validate the cached response (with if-none-match and so on). And the application (e.g. a PHP script) that created the response will probably has to regenerate the response and compute the new etag (if the etag is something like an MD5 sum of the response).

This weblog produces Atom feeds with a relatively unoptimized Python script.  When it runs, it places the entire document which was produced into a file.  From that point on, Apache serves the feed statically, which includes the management of all HTTP headers, and status codes.  Simply entering a comment on this weblog causes two Atom feeds to become stale - the feed for that entry, and the overall feed for the weblog (which contains comment counts).  Both files are simply deleted, and the next request that comes in will cause the script to run again.

The same logic applies to my HTML pages, so my server could sustain a fairly high load and remain responsive.

At no point do I need to predict the future, and guess when then the next comment will be entered, however that option is there for me if even higher scalability is required.

Posted by Sam Ruby at


This weblog produces Atom feeds with a relatively unoptimized Python script.  When it runs, it places the entire document which was produced into a file.  From that point on, Apache serves the feed statically, which includes the management of all HTTP headers, and status codes.  Simply entering a comment on this weblog causes two Atom feeds to become stale - the feed for that entry, and the overall feed for the weblog (which contains comment counts).  Both files are simply deleted, and the next request that comes in will cause the script to run again.

Okay this is very nice when you have control on the update of the data, i.e. in your case you know when a new comment was added (a new comment is posted to [link]). In various occasions that I wanted to have a feed, the data are located in a database and are updated outside of my control, by other applications. (this is a more “enterprice-y” use of feeds :-)) So I don’t know when a new row is inserted in the database ("advanced" triggers anyone?) and either I have to check (poll) let’s say every half an hour in order to produce a static atom feed or have every access to my feed to go to a PHP/ASP/Servlet/... that checks for modifications and respond accordingly (e.g. 304). I usually prefer the latter solution so that no client is given a stale feed.

This is the case that I had in my mind and I acknowledge that it’s something beyond the usual use of feeds (but should it?)

thanks
Stelios

Posted by Stelios Sfakianakis at

This is the case that I had in my mind and I acknowledge that it’s something beyond the usual use of feeds (but should it?)

As long as your feed has basic things like stable ids and dates and some human readable content, this is a perfectly reasonable use of feeds.

Backing up, a perfectly reasonable answer to the question “does your application support ETags” is “No” as long as that was a conscious choice and that choice was based on some analysis and preferably some measurements.

Note that there are some other options than you list.  For example, you could check every 30 seconds instead of every thirty minutes.  This could dramatically reduce the amount of times that you have to hit the database, and yet consistently provide data that is reasonably fresh.

Finally, this whole discussion exposes a difference in world views and focus.  I’ve met Clemens and he seems like a reasonable person, but we do have fundamentally different perspectives.  From what I can see, Clemens cares most about what the application sees, and views the data that goes across the wire as a mere implementation detail.  To me, what goes across the wire is the most important thing, and how the application choses to view is is an implementation detail.

I bring this up as your description of enterprisy feeds smacks to me of “this is the data I have, and therefore this is what I will provide”, which frankly seems inside out to me.  Instead, I prefer a more outside-in approach of starting with what is going across the wire, and figuring out how to provide it.

Posted by Sam Ruby at

As long as your feed has basic things like stable ids and dates and some human readable content, this is a perfectly reasonable use of feeds.

Absolutely! I was very happy when I thought about using syndication for non-blogging content (basically using feeds as a means to implement pull event notification protocols over HTTP) and then I saw that other people had also similar ideas..

Backing up, a perfectly reasonable answer to the question “does your application support ETags” is “No” as long as that was a conscious choice and that choice was based on some analysis and preferably some measurements.

Just to make it clear: I DO provide etags in all my feeds and also Last-Modified whenever I can somehow deduce the last modification date from my data.

I bring this up as your description of enterprisy feeds smacks to me of “this is the data I have, and therefore this is what I will provide”, which frankly seems inside out to me.  Instead, I prefer a more outside-in approach of starting with what is going across the wire, and figuring out how to provide it.

I totally agree with you! That’s why I try to generate etags even if this includes the (re)generation of content and the computation of its digest (usually MD5). I 'll reconsider though the polling implementation you suggested.. thanks!

Posted by Stelios Sfakianakis at

I see some follow up regarding ETags here:
[link]

Posted by Dilip at

Heuristics

Sam Ruby has been talking about projects supporting HTTP ETags; he’s trying to use this as a heuristic to measure the “cluefulness” of developers working on syndication projects. I think it’s an interesting approach, which I’ve also been...

Excerpt from [ manuzhai ] — because I care at

Add your comment