It’s just data

Wiki Spam Update

In the past 72 hours, over two hundred updates to the Atom wiki have been turned away as spam.

There are a number of different types of spammers.  Of little concern are the curious (is it true that anybody can update a page?  yes).  Nor are the defacers (let's update the pages to call everybody "gay". hehehe) much of a problem.

The overwhelming majority of spammers are the cropdusters: sprinkling wide areas with links to gambling, porn, and pharmaceutical sites.  Due to the addition of nofollow attributes on the links, these provide no benefit to the perpetrator; but there is increasing evidence that most of these are spammers are not literate, at least not in the English language.

One such spammer periodically comes in from a private page on this site and one by one edits a number of pages; apparently unable to read the English message text that accompanies the 403 forbidden status code that accompany the response to each POST.  To reduce effort for both sides, I'm now blocking GETs from that site.

I now employ a number of blocking techniques, ranging from requiring login on a number of pages, blocking based on IP address or user agent or referer, blacklists on words in the content of the update, and a throttle on the rate of updates.

But the most effective is a relatively recent addition that relies on the greed of the cropduster: any page which contains more than ten additional external links is rejected.  Only a handful of existing pages contain such a number of external links in total, so any attempt to add such a number of links all at once is very suspect.


Should you really employ nofollow in a wiki where there is no distinction between author and commentor?

Posted by Ross Mayfield at

Re: Wiki Spam Update

The real question is if nofollow should be applied at all. Using nofollow seems like a scorched earth approach, it says "Even if you manage to sneak some spam by me you don't get any PageRank(tm). Nyah, Nyah!". Unfortunately, the fact would remain that the wiki would still have been spammed.

Message from Dare Obasanjo

at

Ross, probably even with more reason: wikis mostly contain internal references.  Especially wikis such as Sam's that concentrate on a single subject.

A wiki is introverted, a blog is extroverted.  Using nofollow on links in a wiki probably does not have much impact on the world...

Posted by Janne Jalkanen at

Dare, while I remain overall skeptic on the spam reducing value of nofollow, I turned it on for two reasons.  Primarily to give it a chance to prove itself, but and also because spam breeds more spam.  A fair number of spammers search for search terms (mostly using gambling, porn, or pharmaceutical terms), and correctly conclude that such sites are worth targeting.

Perhaps, someday, I'll try removing the nofollow attributes (it is just one line of code, and takes effect immediately).  But just not yet.

Posted by Sam Ruby at

Dare, is that because spammers don't care, because |rel="follow"| isn't implemented widely enough so spammers don't care. The experiment with |rel="nofollow"| just started, it might be too soon to make conclusions at the moment.

Posted by Anne at

Sam, the "spam breed more spam" argument does not make much sense here does it? Since the text the links are using will still be indexed. Or the text surrounding the links. I do not see any relation to |rel="nofollow"| with that.

Posted by Anne at

Anne - good point.

Posted by Sam Ruby at

Spam breeds more spam because spammers use the "link:domainname.com" syntax to search Google for pages that have already been hit.  Example: [link]

I don't know if rel="nofollow" prevents this, but the situation has certainly gotten out of control.

Sam, could you add "spammers" to your spell-checker?

Posted by Mark at

Sam may be correct that spammers may not be English - literate (or fluent, or techspeak fluent), but I think a big part of it is that they just don't care enough to read.... [more]

Trackback from The 80/20 Solution

at

Simon Willison : Wiki Spam Update - Sam Ruby suggests blocking changes that add 10 or more new links....

Excerpt from HotLinks - Level 1 at

Wiki Spam

[link] Shit, wiki wordt natuurlijk geplaagd door spam, en ik heb net die link daar naar geplaats. :-( zucht In the past 72 hours, over two hundred updates to the Atom wiki have been turned away...

Excerpt from Red.Cube at

Wiki Spam

[link] Shit, wiki wordt natuurlijk geplaagd door spam, en ik heb net die link daar naar geplaats. :-( zucht In the past 72 hours, over two hundred updates to the Atom wiki have been turned away...

Excerpt from Red.Cube at

nofollow will never stop spam because spammers don't care.  They don't know which sites implement it and which don't and they don't bother checking to find out.  Using nofollow in a wiki is really worthless and actually does more harm than good, so you'd be better off implementing some better server side filtering (perhaps a Bayesian filter combined with a few other techniques) so the spam doesn't even get published.

Posted by Lachlan Hunt at

Sam Ruby: Wiki Spam Update

[link]...

Excerpt from del.icio.us/leinster at

We run a wiki and haven't really had too much problem with wiki spammers, apart from one persistent guy from China who occasionally comes in and added dozens of links to about 10 pages every few weeks. Blocking IP wasn't enough since they had access to quite a large block. We now block edits that contain chinese characters in the anchor text, or add more than 20 links to the previous revision. Since almost all the spam appears to be done manually rather than by a robot, we added a sleep(20) to the code before printing out the error message denying the edit - that should be frustrating enough for a persistent spammer, while hopefully not too annoying to a legitimate submitter getting it once.

Posted by John McPherson at

Add your comment