It’s just data

Comment Spam

Now that I am back home and rested, it is time to share an amusing story... as Randy noticed, I got some comment spam on Monday, all referencing an online gambling site.

32 comments in the course of 65 minutes.  The last 9 of which were not seen by anybody as I had blocked the ip address by then.

65 minutes to create.  Carefully crafted to appear to be on topic.  10 seconds to wipe out.


Hi Sam, I wonder what your thoughts are about engaging in a coordinated response.  I went ahead and published my list of comment spam IPs and started talking with Scott Johnson about how to do a central registry of banned IPs.  Others have written MT plugins.  Others have worked on rigging robots.txt and using javascript to prevent spammer messages from showing up in search engines.

Others may disagree, but this strikes me as an important problem that we should meet now rather than later.

What do you think?

Posted by Andrew Grumet at

Andrew, I'm not sure I have any concrete suggestions at this point (blacklists are merely a bandaid), but I'm now subscribed your weblog and will participate.

At the moment, this hasn't yet become a big problem for me, but I am considerating preventing comments from unapproved ip addresses from appearing in feeds until I approve of them.

Posted by Sam Ruby at

Take a look at James Seng's bayesian commant spam filter, it's pretty cool and comprehensive. it covers trackbacks as well.

http://james.seng.cc/archives/000152.html

Posted by Brendyn Alexander at

Nice. Makes you want to ask the spammer: «Was it worth it?». I think not.

Posted by Asbjørn Ulsberg at

Pingback from hatch.org : Is Comment Spam Cost Effective? : Steven Hatch's weblog

at

I don't think that IP banning will be very effective, unless the information is centralized in a service, and controlled in a reasonable manner. It's just too easy for a spammer to change to a different IP source.

The bayesian approach is going to work better with large chunks of input text, and blog comments are typically fairly small. Plus, the comment spammers are already specifically tailoring spams to appear 'hammy'.

The main target to filter on is going to be the URLs that the spammers are trying to promote. Those can be blacklisted much more effectively. Another idea I'm kicking around is to obfuscate URLs in comments by randomly replacing characters with their numeric entity equivalents. Also, this may push more blogs towards encouraging users to go through some sort of optional registration/verification process. A verified user would bypass the filtering.

The WordPress team is looking at these issues and will probably implement several of these strategies to work in concert. I invite further discussion on my War on Spam site:

Posted by Dougal Campbell at

Nice one Sam.

Dougal has the Achilles heel : "the URLs that the spammers are trying to promote".

Posted by Danny at

Fun ways of tackling comment spam

Sam Ruby : I got some comment spam on Monday, all referencing an online gambling site. 32 comments in the...... [more]

Trackback from Raw

at

As Dougal noted, blocking the spam URLs is the key.  MT-Blacklist, new plugin by Jay Allen does just this. It seems to work great.  I have created a blog spam database where people can share information about spam URLs and other spam sources.

Posted by Mark Carey at

Spam En Comentarios

Grr, ahora está llegando acá también la mafia de los spams en comentarios, algo acerca de lo que había leído, pero que no había visto aún en sitios en español. Hello from the USA. My spanish is not so good...... [more]

Trackback from Mató Tu Onda!

at

Comment Throttle

I've implemente a throttle on comments to prevent runaway spammers. One of the major topics to du Jour is spam comments.  There are those who are wildly optimistic and others who are wildly pessimistic...  (I tend to the optimistic side myself - the tr... [more]

Trackback from Sam Ruby

at

Is Comment Spam Cost Effective?

I'm getting my fair share of comment spam like many other bloggers, but I can't imagine that the cost/time ratio is actually worth it. I think Sam Ruby sums it up best: "65 minutes to create. Carefully crafted to appear...... [more]

Trackback from hatch.org

at

Add your comment