Tolerance
Steve Jones: To use a physical engineering analogy, you check that the wing is put on properly as soon as it is attached, not by looking out of the window at 30,000ft to see if it is still there.
Clearly requiring validation for airplane wings prior to use is sane. Leaping from that point to requiring validation for all web based services is not only overkill, it actively creates an unacceptable level of impedance. That’s why most engineers talk in terms of tolerance. I would also assert that acceptable levels of tolerance differ depending on whether or not a given operation is safe or not.
That’s why I get irritated when people implicitly tightly couple the need for distributed extensibility with draconian processing. There always will be long tail requirements, and while I realize that that such requirements may be addressed in due course, that approach clearly doesn’t scale.
While I see some inspiration from Google Mobile Services, the right next step is to prototype my ideas and then to get the browser vendors to implement it first.
All roads lead to tolerance
There is an interesting conversation about Validation going on at the moment. Mark Baker started it, basically saying that XML validation techniques like DTD, XSD and RelaxNG are too time dependent to be used in web scale architectures...Excerpt from Base4 Ideas at
Sam loved a lot of your stuff, but here I’m going to have to differ contracts can give huge amounts of flexibility when rigidly enforced, as Apache superbly demonstrates. Laissez Fair might be great for French foreign policy, but its a crap way to run an IT infrastructure.
Posted by Steve Jones at
I’m not going to argue about the value of adhering to rigid specifications with a guy who has 933 validation errors on his home page.
Posted by Mark at
Steve: take a look at this example.
Can you point to anybody who has suggested that Apache or Firefox should accept random packets? If so, I will certainly be right by your side and say that such a position would be silly.
But if you, or anybody else, suggests that the only possible alternative to accepting all random packets is to rigidly enforce full schema validation on all requests, then I would say that you are guilty of having created a false dilemma.
Posted by Sam Ruby atVersioning Does Not Make Validation Irrelevant
... [more]Trackback from Dare Obasanjo aka Carnage4Life at
Versioning Does Not Make Validation Irrelevant
Mark Baker has a blog post entitled Validation considered harmful where he writes We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale. Specifically, our argument is this; Tests of...Excerpt from TopXML Reblogger XML News at
Sam,
What I’m saying about validation is that it should be rigidly enforced to the specification of what that piece is. Thus Firefox accepts elements only to a specification (HTTP) and doesn’t allow anything else, its rendering engine then accepts HTML, but doesn’t require it to be rigid in no small part because its rare to find rigid HTML and as its human readable these glitches can be partially ignored. The engine also accepts XML + XSLT, where there is an increase in validation (XML must be well formed).
This is my point of contracts, they give flexibility for the purpose of that contract. Thus for XML documents where there is a known contract (the schema) and a “rigid” element producing it (i.e. auto-produced by a server) then the schema should be enforced as this is the contract (ala HTTP for Firefox/Apache) which has been agreed between the two parties.
My point on Firefox is that it does rigidly enforce a specific contract (HTTP) and it is exactly that enforcement that enables the immense flexibility of HTML over HTTP rendering as it means that Firefox can communicate to Apache/IIS/Tomcat/Jetty/MyFirstHTTPServer. The reason for suggesting the “random packets” is exactly because no-one has ever suggested that as a smart idea, but that is (IMO) exactly what is being suggested by the “late as possible” validation crowd. Firefox and Apache validate HTTP at entry, anything not in HTTP is rejected.
As Pirelli say “power is nothing without control”.
Posted by Steve Jones atFirefox and Apache validate HTTP at entry, anything not in HTTP is rejected.
This is is a far, far cry from XML schema validation, in fact it is considerably weaker than XML’s “must be well-formed”; in the FireFox case it really is only a slight bit stronger than “must contain a blank line”.
Posted by Sam Ruby atThus Firefox accepts elements only to a specification (HTTP)
This is so far from the truth that I can only assume it is meant as satire.
- [link] “NCSA/1.5.2 has a bug in which it fails to send a version number if the request version is HTTP/1.1, so we fall back on HTTP/1.0”
- [link] “tolerate some junk before the status line”
- [link] “HTTP/1.0 servers have been known to send erroneous Content-Length headers. So, unless the connection is persistent, we must make allowances for a possibly invalid Content-Length header.”
- [link] “Although ‘Pragma: no-cache’ is not a standard HTTP response header (it’s a request header), caching is inhibited when this header is present so as to match existing Navigator behavior.”
- [link] “Special case these headers [including Set-Cookie] and use a newline delimiter to delimit the values from one another as commas may appear in the values of these headers contrary to what the spec says.”
- [link] “Ignore wacky headers too... this one is for MS servers that send "Content-Length: 0” on 304 responses
- [link] “We skip over mal-formed headers in the hope that we’ll still be able to do something useful with the response.”
- [link] “RFC2616 section 19.6.2 states that the "Connection: keep-alive” and “Keep-alive” request headers should not be sent by HTTP/1.1 user-agents. Otherwise, problems with proxy servers (especially transparent proxies) can result. However, we need to send something so that we can use keepalive with HTTP/1.0 servers/proxies. We use “Proxy-Connection:” when we’re talking to an http proxy, and “Connection:” otherwise."
- [link] “IIS implementation requires extra quotes”
- [link] “For .gz files, apache sends both a Content-Type: application/x-gzip as well as Content-Encoding: gzip, which is completely wrong. In this case, we choose to ignore the rogue Content-Encoding header. We must do this early on so as to prevent it from being seen up stream. The same problem exists for Content-Encoding: compress in default Apache installs.”
- [link] “if "*”, then assume response would vary. technically speaking, “Vary: header, *” is not permitted, but we allow it anyways."
- [link] “if the response depends on the value of the "Cookie” header, then bail since we do not store cookies in the cache. ... this implementation is obviously not fully standards compliant, but it is perhaps most prudent given the above issues."
- [link] “If the cached response does not include expiration information, then we must validate the response, despite whether or not this is the first access this session. This behavior is consistent with existing browsers and is generally expected by web authors.”
- [link] “From RFC2617 section 1.2, the realm value is defined as such: ... but, we’ll accept anything after the the "=” up to the first space, or end-of-line, if the string is not quoted."
- [link] “some servers give junk after the charset parameter, which may include a comma, so this check makes us a bit more tolerant.”
- Don’t even get me started on all the different variations of dates that Firefox accepts! See [link] which is called from [link] among other places.
I found all of those in 15 minutes of poking through code I’ve never seen before. And remember, those are just the ones that are commented!
Posted by Mark atIt would be interesting and useful to have a reality-annotated version of RFC 2616.
Posted by Henri Sivonen at
Sam Ruby: Tolerance
acceptable levels of tolerance differ depending on whether or not a given operation is safe or not...Excerpt from Public marks with tag rest at
Every webservice is an island
I like where Jason is going with his network oriented programming post.If we think of the network as the new platform, IO and transport is not really a problem as HTTP is pretty well proven. However there is one nasty gotcha, tolerance. Today you...Excerpt from Base4 Ideas at
Yep:
metasoup
Posted by Bill de hOra at