It’s just data

Defying classification

What is Planet ApacheWhat is Technorati?  These questions fascinate me.

Mailing list vs comments as a venue for dialog?  To me, that's like comparing board meetings vs pubs for as venues for making business decisions.

I am told that using information at the IP layer to affect behaviour two layers up isn't going to work?  And yet I run firewall programs filter outgoing connections based on the path of the main program which issued the request.  Despite coming up with "perl.exe" for a fair number of commands, it does seem to catch a lot of spyware.

Classifications are so impossible, yet so darn useful.

Heuristics DO work for the common case, and they make your job easier. The problem is, in the cases where they don't, they fail spectacularly, and when used on a network, they tend to hurt the other guy.

Think cache freshness heuristics in HTTP. Think virus scanners. Think Net Nanny-style Web blockers. Think racial profiling.

To me, use of heuristics is almost invariably signal a design flaw in a system that's been temporarily patched. It will eventually fail to address the real problem, and in the meantime things have got much worse.

If we want to use heuristics to stem the tide of comment spam, fine, but we shouldn't believe for a moment that it's a long-term solution.

Posted by Mark Nottingham at


have you seen much in the way of these DDoS attacks in blog comments? 

my site got hit pretty hard yesterday, and just curious if it seems to be a new wave of comment spamming...


Posted by jen at

I may be going out on a limb here, but I'll bet that mnot runs some sort of spam filter based on heuristics.  Is email broken?  You betcha.  So is the thought of having an open comment system on the Internet.

Jen, so far I have avoided being hit by most of the automated attacks, presumably because my weblog is not based on the same software that most other people are running.  Here are some simple changes that you can try.

Posted by Sam Ruby at

Sure do, and I still have to run through the spam mailboxes manually about once a day, because I don't trust the filters (with good reason; there's always one or two).

I have hope because there's a lot of interesting work going into re-engineering the protocols underlying e-mail to make it viable again, without using heuristics; SPF is only the most recent. Spam filters are only a stopgap.

There are a lot of ways we could do similar things with comment systems. I've turned off URLs and forced a preview in my blog, thereby removing one kind of benefit from abusive commenting, and have had very good results. Until Google fixes their system, that's probably going to remain my approach.

I could also imagine a system that, whenever someone (as identified by their e-mail address) comments on a site, generates an e-mail with a one-click approve URL in it. The first time your e-mail address comments, it gets sent to the owner; thereafter, the e-mail gets sent to whoever made the comment.  This approach distributes the cost of authenticating and authorising comments in a reasonable fashion; I'd be interested to see how well it works.

Posted by Mark Nottingham at

re: "I could also imagine a system that, whenever someone (as identified by their e-mail address) comments on a site, generates an e-mail with a one-click approve URL in it."

Here's an example: securing a room. Option one: convert the room into an impregnable vault. Option two: put locks on the door, bars on the windows, and alarm everything. Option three: don't bother securing the room; instead, post a guard in the room who records the ID of everyone entering and makes sure they should be allowed in.

Option one is the best, but is unrealistic. Impregnable vaults just don't exist, getting close is prohibitively expensive, and turning a room into a vault greatly lessens its usefulness as a room. Option two is the realistic best; combine the strengths of prevention, detection, and response to achieve resilient security. Option three is the worst. It's far more expensive than option two, and the most invasive and easiest to defeat of all three options. It's also a sure sign of bad planning; designers built the room, and only then realized that they needed security. Rather then spend the effort installing door locks and alarms, they took the easy way out and invaded people's privacy.


Posted by Mark at

How exactly does Schneier's analogy apply here, Mark? He's talking about protecting civil liberties from government intrusion, not the ability to comment on a private Web page anonymously.

Posted by Mark Nottingham at

distributes the cost of authenticating and authorising comments in a reasonable fashion

We clearly have a different idea of what acceptable costs are here.

Take a look at this comment.  Do you really believe that this individual would have registered?  How many times have you been faced with a registration form, and decided... nah.  I know I have.

This person took a look at my warnings, and decided that they did not apply to him.  Meanwhile, over the past two days, I captured six comments that were previewed but never were submitted - each of which I would have manually marked as spam.

Posted by Sam Ruby at

Where do I say anything about a registration form?

If you don't like that system, fine, there are other approaches; the point I'm making is that heuristics aren't a good basis for a long-term solution.

Posted by Mark Nottingham at

off to the windy city

You know you must have done something very wrong in a past life when your business trips in January take you from Rochester to Chicago. Brrrrrr. (Not to mention the annual summer pilgrimage to Alabama; fire and ice, baby.) But duty calls, so I’m off to O’Hare this afternoon (weather permitting), back on Saturday. This is the first in a long series of trips from here ‘til the end of March. I won’t have broadband on this trip, so I expect blogging will be curtailed for a bit. On the plus side, I’ll be dining with AKMA, Margaret, and Pippa...... [more]

Trackback from mamamusings


Add your comment