In the past 72 hours, over two hundred updates to the Atom wiki
have been turned away as spam.
There are a number of different types of spammers. Of
little concern are the curious (is it true that anybody can
update a page? yes). Nor are the defacers
(let's update the pages to call everybody "gay". hehehe)
much of a problem.
The overwhelming majority of spammers are the cropdusters:
sprinkling wide areas with links to gambling, porn, and
pharmaceutical sites. Due to the addition of
nofollow attributes on the links, these provide no
benefit to the perpetrator; but there is increasing evidence that
most of these are spammers are not literate, at least not in the
English language.
One such spammer periodically comes in from a private page on
this site and one by one
edits a number of pages; apparently unable to read the English
message text that accompanies the 403 forbidden status
code that accompany the response to each POST. To reduce
effort for both sides, I'm now blocking GETs from that site.
I now employ a number of blocking techniques, ranging from
requiring login on a number of pages, blocking based on IP address
or user agent or referer, blacklists on words in the content of the
update, and a throttle on the rate of updates.
But the most effective is a relatively recent addition that
relies on the greed of the cropduster: any page which contains more
than ten additionalexternal links is rejected.
Only a handful of existing pages contain such a number of external
links in total, so any attempt to add such a number of links
all at once is very suspect.
Should you really employ nofollow in a wiki where there is no distinction between author and commentor?
The real question is if nofollow should be applied at all. Using nofollow seems like a scorched earth approach, it says "Even if you manage to sneak some spam by me you don't get any PageRank(tm). Nyah, Nyah!". Unfortunately, the fact would remain that the wiki would still have been spammed.
Dare, while I remain overall skeptic on the spam reducing value of nofollow, I turned it on for two reasons. Primarily to give it a chance to prove itself, but and also because spam breeds more spam. A fair number of spammers search for search terms (mostly using gambling, porn, or pharmaceutical terms), and correctly conclude that such sites are worth targeting.
Perhaps, someday, I'll try removing the nofollow attributes (it is just one line of code, and takes effect immediately). But just not yet.
Dare, is that because spammers don't care, because |rel="follow"| isn't implemented widely enough so spammers don't care. The experiment with |rel="nofollow"| just started, it might be too soon to make conclusions at the moment.
Sam, the "spam breed more spam" argument does not make much sense here does it? Since the text the links are using will still be indexed. Or the text surrounding the links. I do not see any relation to |rel="nofollow"| with that.
Sam may be correct that spammers may not be English - literate (or fluent, or techspeak fluent), but I think a big part of it is that they just don't care enough to read....
[more]
[link] Shit, wiki wordt natuurlijk geplaagd door spam, en ik heb net die link daar naar geplaats. :-( zucht In the past 72 hours, over two hundred updates to the Atom wiki have been turned away...
[link] Shit, wiki wordt natuurlijk geplaagd door spam, en ik heb net die link daar naar geplaats. :-( zucht In the past 72 hours, over two hundred updates to the Atom wiki have been turned away...
nofollow will never stop spam because spammers don't care. They don't know which sites implement it and which don't and they don't bother checking to find out. Using nofollow in a wiki is really worthless and actually does more harm than good, so you'd be better off implementing some better server side filtering (perhaps a Bayesian filter combined with a few other techniques) so the spam doesn't even get published.
We run a wiki and haven't really had too much problem with wiki spammers, apart from one persistent guy from China who occasionally comes in and added dozens of links to about 10 pages every few weeks. Blocking IP wasn't enough since they had access to quite a large block. We now block edits that contain chinese characters in the anchor text, or add more than 20 links to the previous revision. Since almost all the spam appears to be done manually rather than by a robot, we added a sleep(20) to the code before printing out the error message denying the edit - that should be frustrating enough for a persistent spammer, while hopefully not too annoying to a legitimate submitter getting it once.