Adam Gessaman: It appears that these sites, using a clean little
weblog as a front, are hosting a large amount of porn.
Wow. To me, it looks like a lot of effort for so little
gain, but clearly I am missing something. But it any case,
the comments on idly's blog entry are worth reading, as is the
(scroll down to see it) to Mark Pilgrim's blog entry on spam.
Sam: In case you haven't noticed, the Gessaman link has a typo in it.
As I understand it, the intention of these spammers (and SEOers) is to raise the PageRank of their pages because of the linkback in high-PageRank-scoring blogs. So the solution is (IMO) easy. Don't give Google (or any well-known search engine) links from comments.
By "grey listing" search engines and not delivering them links in comments, we nullify any benefits spammers and SEOers get from their abuse. They don't care about the blog author, nor its readers. Google is the only thing that matters to them.
Of course, hiding comments from Google doesn't have to be global. If there's a trust based mechanism on comment posters, its possible to use this to establish what links Google does see.
Of course the downside is that statically generating a blog site isn't possible, but this probably can be worked around using mod_rewrite and transparently redirecting only Google to the "cleaned" comments.
If the spammer's just looking for PageRank, then cleaning the comments promptly, as Sam does, should be equally effective. Google doesn't come by so often that cleaning out spam 2x daily should prevent the spammer getting any PageRank out of it. Of course, we have better things to do than edit comments, and not everyone is as on the ball as Sam.
I think that the fact that you're publishing links to the spammer's site to anyone who subscribes to comments is probably an equal motivator. I'll often click on the link given by the commenter if they say something interesting or controversial (the latter is usually when I'm thinking "who the heck is this nut?"). The fact that it's a lot of work for little return isn't a big deal - remember, these are spammers. The whole name of the game is doing a lot of work for a few responses.
Specifically giving Google (or any other search engine) different content may not be such a good idea. I think I read before that Google even has detection mechanisms in place that will lower your page rank if it discovers that you are doing so.
Better to detect the spamvertized links on the front end and handle them there. If you had some sort of scoring mechanism in place, you could automatically block know spam, of course, and automatically whitelist known urls, and handle "gray" links by simply not hyperlinking them (and presenting a text link which would have to be cut-n-pasted), or using some other encoding method that search engines wouldn't recognize. I'm currently working on an experiment to test that second idea.
SEO and blog comment spammers - an idea for a solution
Comment spamming is one of the biggest gossip subjects on blogs in the last couple of months. Its annoying and abuses the open and free spirit of blogging. Comments are one place where readers can be part of the community, where ideas and viewpoints...
I’m sorry but this is just sad. It appears that John Kerry has joined the merry group of hard core sex sites that regularly spam my web logs with so called referrer spam . How this works is that you create generally some sort of spider, that...
Just noticed the converse of this: a friend created a blogger.com account, and his profile links his home webpage as www.blogger.com/r?URL . I picked up on this as "they've cheated you of some googlejuice!", but maybe that's deliberate.
I suspect (but haven't checked) that taking the legit browser user via a "302 Moved Temporarily" will throw the googlebot off the external link... thereby rendering referrer and comment spams futile.
Worst case is you have to robots.txt-block googlebot from /r on your server. Maybe also call your redirect URL "untrusted-url-redirect" or something to make it clear why you do this. The paranoid can encrypt the arg to the CGI script, or just put the URLs in a DB tinyurl.com style.
Has anyone noticed this before? Sorry if I'm behind the times. I live in a sleepy corner of the 'net where referrer spam happens to other people.