It’s just data

Wiki Spam Update

In the past 72 hours, over two hundred updates to the Atom wiki have been turned away as spam.

There are a number of different types of spammers.  Of little concern are the curious (is it true that anybody can update a page?  yes).  Nor are the defacers (let's update the pages to call everybody "gay". hehehe) much of a problem.

The overwhelming majority of spammers are the cropdusters: sprinkling wide areas with links to gambling, porn, and pharmaceutical sites.  Due to the addition of nofollow attributes on the links, these provide no benefit to the perpetrator; but there is increasing evidence that most of these are spammers are not literate, at least not in the English language.

One such spammer periodically comes in from a private page on this site and one by one edits a number of pages; apparently unable to read the English message text that accompanies the 403 forbidden status code that accompany the response to each POST.  To reduce effort for both sides, I'm now blocking GETs from that site.

I now employ a number of blocking techniques, ranging from requiring login on a number of pages, blocking based on IP address or user agent or referer, blacklists on words in the content of the update, and a throttle on the rate of updates.

But the most effective is a relatively recent addition that relies on the greed of the cropduster: any page which contains more than ten additional external links is rejected.  Only a handful of existing pages contain such a number of external links in total, so any attempt to add such a number of links all at once is very suspect.