It’s just data

Comment spam

Don Box: To date, the few times I've wanted comments, I've pointed people over to Sam's place and let the peanut gallery operate on Sam's bandwidth dime. I think this time around I'll do it mainly because I want to experiment with some anti-troll/anti-spam ideas.

Comment spam has gone from a curiosity to an irritant to an amusement of mine.  Why an amusement?  It is fun seeing greedy spammers who can't limit themselves to two consecutive comments earn a 24 hour ban on posts from their ip address and on posts which contain the url that they are trying to pimp. This ban is later upgraded to 72 hours once I manually verify that the comments are in fact spam.

Today somebody tried to post ten times from a variety of ip addresses before they finally gave up.  The next step may be to automate the removal of the initial two comments on such attempts, making this process totally painless on my part.

The best solution I have found to date to trolls is trackback and pingback.  It is amusing to me how many people who get sanctimonious about their rights to my bandwidth go silent when I suggest that they post their comments on their weblog.

If you do implement comments, one thing I suggest is that each comment should have a permalink.  Next on my list would be feeds for, and search over, comments.

I've turned off comments at my blog, because I couldn't be bothered to implement code to sanitize them (yet). Some day I may turn them back on, but it won't be in the context of Movable Type.

One of the major reasons people use comment spam is because a page is very popular on Google. I've considered the idea that perhaps what should be done is, instead of having the comments on the SAME page as the original article, instead include a link over to some SEPARATE comments system. This could be a forum system, maybe (where the topics are auto-created by the system when you post), or even a free-for-all Wiki. The more I think about it, the Wiki actually seems best: let the readers prune out what's objectionable, because they'll see it before I do.

Anyway, I do miss the comments on my blog, but not a lot. When things are important, bloggers post about it, and get their opinions out there. There's plenty of free blogging solutions for people to use. There's really no reason why people can't have their own voice, and something more permanent and collected than a bunch of comments on various blogs.

And here I am commenting. What a hypocrite I am. :-p

Happy Sunday!

Posted by Brad Wilson at

Have you programmed the search engines to ignore your comments with robots.txt? This is the obvious defense against comment spam. I did this on, and announced it, and the spamming stopped.

Posted by Dave Winer at

Post hoc, ergo propter hoc.  It is extremely doubtful that your announcement caused spammers to stop.  Spammers do not read weblogs, they only write to them.

Anyway, Sam's comments are displayed on each entry's page.  There is no way to get search engines to ignore them without ignoring his entire weblog, which is no solution at all, and still wouldn't stop the comment spam anyway.  Spammers don't respect robots.txt; they don't even check it.

Posted by Mark at

Much of the valuable content on this weblog comes from the comments.  Removing them from the indexable web would throw the baby out with the bathwater, IMHO.

Posted by Sam Ruby at

I agree with Sam... if the comments aren't indexed, then that lowers the value a lot. The key is to keeping the comments separated from the main page, so they don't get indexed (and ranked) together. Then you should, in theory, only get comment spam on COMMENTS pages that have a high link factor, which I would expect would be pretty rare.

Posted by Brad Wilson at

Brad: I would agree that the reward to spammers would be lower if I required people to do an additional click to get to the comments, but spammers are used to low rates of return (how many email spams did you receive today?  How many did you respond to?).

For that reason, I doubt many spammers remember Dave's announcement or have observed my weblog long enough to see that I am highly diligent about cleaning up the abuse.

Spammers simply see a weblog with a respectable Google ranking and either unleash a script (or a person in the third world) and don't look back.

Posted by Sam Ruby at

Well, I hate to be the bearer of good tidings, but the spam on has stopped. Maybe we haven't yet attracted the truly virulent kind of spammer (this isn't comment spam, it's ping spam) and I'm still blocking the big keywords (the obvious choices) so who knows why they stopped, but thankfully they did.

Posted by Dave Winer at

Dave, if your original question was whether or not I had updated my robots.txt to block my referrers from being indexed, the answer to that question is yes.

Update: the comment spammer of yesterday was back this morning.  Five more posts were blocked, humorously one was to this very blog entry.

Posted by Sam Ruby at

re: "I'm still blocking the big keywords [on]..."

I was unaware that was moderated.  Is there documentation somewhere that states the moderation rules?

Posted by Mark at

Oddly enough, after my emails to a few folks about equating excessive spam comments and DoS, I haven't had a blitzer. As for the others, if I turn off comments on the old posts they hit, I rarely get comment spam (and those entries were old enough that I no legit comments). Finally, not talking about the spammers seems to be the last little cog -- I haven't had a comment spam in close to two weeks. Oh, I know I'll get them, but they're more a minor nuisance for me now.

Perhaps I've lost my popularity and rank and that's why they no longer come by. Ah well, every cloud and silver linings.

Mark, I thought you knew that Dave filters This has been talked about before -- he deliberately excluded one weblog I know of because he was offended at the name. I was sure you were around when we talked about it. Fancy.

Posted by Shelley at

That is entirely possible, but I had forgotten.  The original question remains, what are the moderation rules for  I'm not looking for a point-by-point blacklist (which would just serve to give the spammers something to route around); a general statement of principles would be fine.  URLs with objectionable words?  URLs of commercial sites?  What kind of commercial sites?  Sites that excessively ping without cause?  How frequently?

This may be listed somewhere already, but the FAQ does not mention it, and it points to a mailing list that does not, to the best of my search abilities, appear to mention it either.

Posted by Mark at

Through a combination of MT-BlackList, turning off comments on older items and some manual effort, I've been able to keep comment spam in check fairly well on my weblog.

I still believe the best solution would be to attack the spammer's google ranking by changing their links though (as opposed to just deleting the spam), as this would force the spammers to blacklist those that implement this technique (since their google-ranking would otherwise go down instead of up). Unfortunately this would only work if a large number of weblogs would implement this, and considering the rather minimal response I got on my blog entry on this subject, it doesn't look like that's gonna happen :-(

Posted by Luke Hutteman at

Last month I updated MT my site, and along with the udpate, I added the blacklist and renamed my mt-comments.cgi. I then added a few more regular expressions.

Now I'm in the process of writing some code to modify my .htaccess file when I get several blacklist failures in a row.

I also intend to automate the updating of my blacklist.

Along with this, I've modified my .htaccess using Mark Pilgrim's suggestion on 'How to block spambots ...'

Finally, I've complained upstream when tempted to use wget for evil instead for good. Surprisingly, many responses are good.

Now? I get about two comment spams per week, a small enough number to make it worth my while to chase them down like the dogs they are.

I figure if ALL of us use a variety of tactics, we'll succeed in making comment spam too expensive and move on to something easier (and more profitable) like email spam.

Of course, your mileage may vary.

Posted by Mean Dean at

