It’s just data

Comment Throttle

One of the major topics to du Jour is spam comments.  There are those who are wildly optimistic and others who are wildly pessimistic...  (I tend to the optimistic side myself - the trick is to keep the cost/benefit ratio in your favor)

This turns out to be rather timely, given that I was just hit by 143 spams from a single individual over a thirteen hour period.  By all indications this was not automated.

The removal, however, was.  All it took to wipe all these comments out was a single command (which I had to issue twice, once before I flew out to ApacheCon, and one after I landed to clear out the ones created while I was in flight).  This is not much trouble for me, but it does tend to get noticed by people who are subscribed to my comments feed.

So... I've implemented a throttle.  The code is straightforward, but the policy is difficult to put into words.  Suffice it to say that no one can put in three consecutive comments within the period of a day or put in three comments total within a five minute period.


RE: Comment Throttle

What do you consider consecutive comments?

PS: Now that I know I'll be at XML 2003 we should have a planned hanging out session.

Message from Dare Obasanjo at


If I am reading the code correctly (and it's entirely possible that I am not at 3 AM), this is IP-based.  If so, it can be defeated with rotating HTTP proxies.

Posted by Mark at

1

Posted by Danny at

What do you consider consecutive comments?

To put it more concretely, no comment will be accepted if it would result in three of the the last four comments in this view having the same title (viewable as hover text) on the "by" or "from" link.

Posted by Sam Ruby at

Pingback from WordPress — Dev Blog

at

Wouldn't this headache be solved by not putting hyperlinks in comments (eg. you see the commenter's name and their URL next to it rather than being linked)? Would there be a point to spam with plaintext URLs?

Posted by Confused at

Confused: such a solution would not only penalize the spammers, it would penalize all comments.

Posted by Sam Ruby at

Confused: that also wouldn't solve the problem.  Spammers have it in their heads now that weblog comments are a vector to exploit.  They don't look at individual results and tweak their software to stop bothering individuals.  They write generic software that works with millions of sites and goes after them en masse.  So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don't read blogs; they just write to them.

Posted by Mark at

Spammers don't read blogs; they just write to them.

Which is the only reason they can be defeated. A few simple steps, and I have been delightfully spam-free.

Unlike SMTP, or NNTP, the "protocol" for comment submission can be varied in numerous ways. With enough variations deployed, writing a general-purpose 'bot can be made infeasible.

Posted by Jacques Distler at

Jacques: last month I was visited by a human spammer.

Posted by Sam Ruby at

You sure it was human?

See my response.

Posted by Jacques Distler at

The posts varied from one to three minutes apart - some were simplistic responses of an Eliza quality, but an other specifically cited Dare by name (Dare is not a common name, in fact, it is a common English verb).  The response to my Atkins post was as follows:

The atkins diet certainly works. The 2 women I know that each read the book. both felt better and lost weight - not that that is a scientific study...

The user agent was IE.

Not conclusive, but it certainly does not appear to me to have been automated.

Posted by Sam Ruby at

A trivial test: did the "human" download your CSS stylesheet? Robots generally don't bother. (I know it can be cached; you may need to look back in your logs.)

What was the REFERER on the atkins diet post? As (some of?) these "crawler" spambots seem to come in via links from other blogs, or via google searches on some keyword, I would not be surprised if the comment was vaguely on-topic.

All the 'bots I have seen claim to be IE, so that means nothing.

The fact that the posts were spaced from 1 to 3 minutes apart makes it more likely that it was a spambot than a human.

A human would be cutting and pasting into your comment-entry form, and would be trying to get through the process as quickly as possible. A 'bot would be hitting hundreds of different weblogs simultaneously, and would prefer to space-out its HTTP requests, so as not to set off any alarm bells. (Look at how the better search-engine crawlers behave.)

I can't prove you were hit by a 'bot. But, from everything you've said, it's far more likely than not.

Posted by Jacques Distler at

I don't keep my logs that far back, but from memory, the initial referrer was a google query, and the favicon.ico and blog.css were downloaded.  Subsequent posts included previews, some pages were visited without leaving a comment, etc.

While I no longer have the logs, I do have the actual spams.

Posted by Sam Ruby at

Very puzzling, then.

If it was a human, then they were working very inefficiently. If they are going post manually and waste all that time while doing so, they don't have a bright future in the comment-spam 'biz.

If it was a 'bot, then it went to extraordinary lengths to act "human-like" (downloading your  favicon.ico file !?).

I wonder why.

Could it be something really stupid, like a "spambot" written in VBScript, driving IE?

Posted by Jacques Distler at

Very interesting ideas forthcoming in the Blog anti-spam debate

I have a couple of vested interests in erdicating SPAM from my blog and from the rest of the Blogosphere. There are some interesting discussions (and disagreements) brewing in the various listservs and dev-blogs that I regularly visit or subscribe ...

Pingback from Mindful Musings :: Very interesting ideas forthcoming in the Blog anti-spam debate

at

Comment spam

Don Box:  Comment spam has gone from a curiosity to an irritant to an amusement of mine.  Don Box:  Comment spam has gone from a curiosity to an irritant to an amusement of mine.  Why an amusement?  It is fun seeing greedy spammers who can't limit... [more]

Trackback from Sam Ruby

at

Comment Spam

Sam Ruby has some stuff on comment spam. I've written before about weblog comment spam and why I don't think it will be a long term problem. Sam's comment throttling is an example of how we have so many more approaches to deal with weblog comment...

Excerpt from Keith's Weblog at

Preview required?

This is a  trial balloon.  What I am trying to explore is what would happen if I were to convert the act of posting a comment into request/response interaction.  I would very much like to do this in a way that does not significantly inhibit the  sponte... [more]

Trackback from Sam Ruby

at

Preview prototyped

OK, an initial implementation of my preview required functionality is complete.  Other than requiring a preview, most of you should not see any different behavior.  I've also relaxed my spam throttle to allow three comments - this allows the first to g... [more]

Trackback from Sam Ruby

at

Spam Update

Based on the lively discussions of the past few days, it certainly appears that requiring a preview does not impede the flow of discussion.  Cool. Spam also is way down, despite my having removed and relaxed a number of other defenses.  Notably, my spa... [more]

Trackback from Sam Ruby

at

Preview prototyped

Nice. OK, an initial implementation of my preview required functionality is complete. Other than requiring a preview, most of you should not see any different behavior. I've also relaxed my spam throttle to allow three comments - this allows the...

Excerpt from deeje @ BloggerJack at

Beware of Strangers

If they don't come back, it is not possible to have a two way conversation, is it?  Robert Castelo:  Um, the fact that you are getting paid is supposed to make me feel better?  I don't think so.  And I have to agree here with what Doc said about conten... [more]

Trackback from Sam Ruby

at

Simon Willison: Solving comment spam

There are two main schools of thought concerning comment spam: the optimists and the defeatists. Optimists believe that comment spam can be beaten with technology; defeatists (maybe I should call them pessimists) believe that comments are as doomed ...

Pingback from Simon Willison: Solving comment spam

at

The Free Market Reacts

Google, MSN and Yahoo support the use of the rel="nofollow" attribute to limit comment spammers.... [more]

Trackback from Soapbox

at

Why rel="nofollow"? I want rel="spammer"!

Google and a bunch of the blog vendors have introduced a way of anesthesizing URLs in blog comments so that they don't add PageRank. Just put rel="nofollow" in your link, and it won't count. (See, for instance, Google leads the...... [more]

Trackback from Peter Kaminski

at

Considering the nofollow Attribute

The world is afire this morning with talk of the announcements by Six Apart and the three major search engines (Google, MSN and Yahoo) to support a new HTML attribute named nofollow (in full, rel="nofollow"). By adding this attribute to your link anchors, the search engines will no longer consider the linking page as a component of the linked page's...... [more]

Trackback from Don't Back Down

at

rel="nofollow"

Reading between the lines (which in this case isn't particularly hard), this and this (don't forget to view source) suggest that Google are soon to announce that they won't be calculating PageRank for links with a rel="nofollow" attribute. Finally,...

Excerpt from Simon Willison's Weblog at

Google rel=nofollow Roundup

After a bit a couple rumours banging around google finally annouced the decision to fight blog comment spam by ignoring links that had the “rel=nofollow” attribute. MSN and Yahoo quickly jumped on board. Technocrati started an official...

Excerpt from Planet PHP at

Add rel="nofollow" to .Text comments

Today Google, Yahoo, MSN Search, and other search operators announced their support of the rel="nofollow" attribute for <a href="..." /> tags. Adding this attribute indicates the search crawlers, that the specific links should not contribute...

Excerpt from Thomas Freudenberg's Blog at

Phil Ringnalda

Whyever not? For some sites, like corporate brochureware, having policy pages is handy. You want to optimize for the main page, and a TOS or privacy page gives you another way to have every page link to something that links back to the front page....

Excerpt from phil ringnalda dot com: Best use of nofollow by a commercial site: Comments at

Why no-nofollow

In a prior post someone commented: Wow - I hadn’t heard of the nonofollow movement. It seems to be predominantly peopled by SEO monkeys. Why are you joining up?

Excerpt from randomthoughts at

Solving comment spam

There are two main schools of thought concerning comment spam: the optimists and the defeatists. Optimists believe that comment spam can be beaten with technology; defeatists (maybe I should call them pessimists) believe that comments are as doomed...

Excerpt from Ultimate Anti Spam Tools at

Justin Mason: Blog Spam, and a ‘nofollow’ Post-Mortem

An interesting article on blog-spam countermeasures — Google’s embarrassing mistake. Quote: I think it’s time we all agreed that the ‘nofollow’ tag has been a complete failure. For those of you new to the concept,...

Excerpt from Planet Apache at

Add your comment