Version 5 (modified by chip, 12 years ago) (diff)

added link to

How do I deal with spam comments and other content on my web site?

It started slowly. A few commercial links posted in brief, innocuous comments, sometimes barely related to the topic. It didn't take long for the pace to quicken. Our website was being hit by backlinkers and spammers. Pretty soon, it became a significant time drain on admin time. And as Link: GearHack notes, there are over 9 million banned IP's on his list. So, how do conscientious admins approach these problems?

First, a few terms.


"Sometimes called inbound links, backlinks are the lifeblood of Search Engine Positioning. In order for a website to be on the top of search engines like Google without the webmaster having to pay big money for advertising, the website has to have a large number of backlinks. Backlinks are links on one website that lead back to another website. The more established, high quality, and high Page Rank the website that contains the link has, the more power it has to help the linked website with its search engine position. If the website that is getting backlinks gets many High Quality and High Page Rank backlinks, the better the chance it has of being in a high position on the Search Engines."

Backlinkers may be hawking their own products, or they are hired by someone else to hawk theirs. Whichever it is, the game is the same: to use your site to raise their standings in search engines. Many operate using a Link: link farm. Link farms associate with each other, often selling lists of websites that are easy targets, which makes stopping backlinking difficult and time-consuming for admins.

Scraper site:

"A Link: scraper site is a website that copies all of its content from other websites using web scraping. No part of a scraper site is original. A search engine is not a scraper site: sites such as Yahoo and Google gather content from other websites and index it so that the index can be searched with keywords. Search engines then display snippets of the original site content in response to a user's search.

In the last few years, and due to the advent of the Google Adsense web advertising program, scraper sites have proliferated at an amazing rate for spamming search engines. Open content sites such as Wikipedia are a common source of material for scraper sites."


Link: Spamdexing is an umbrella term for a whole group of search engine spam that admins may need to recognize and contend with, ranging from content and link spam, world-writable pages, mirror websites, URL redirection, and cloaking.


Spammers use electronic means to advertise, collect email addresses, infiltrate mail systems to do "free" mass messaging, and much, much more. A few spammers methodically manually spam, but the majority use spam programs to send out their spam electronically. As Link: SixApart notes:

"The real problem is automated comment spamming, driven by scripts or software written specifically for the purpose of producing comment spam. Such software can submit thousands of spam comments in a very short period of time, to many pages on many weblogs."

Link: Directory Harvest Attack:

"A Directory Harvest Attack or DHA is a technique used by spammers in an attempt to find valid/existent e-mail addresses at a domain by using brute force."

There's much more to backlinking and spamming, but those are some fundamentals.

First off, do whatever you can to automate the process of identifying spammers and backlinkers. That will include Link: CAPTCHA (it's been cracked) and Link: Askimet to automate catching what you can. Some admins may also want to check out Link: Honey Pot, and their new project, Link: Project Honey Pot Quick Links.

Some sites may want to do manual checking. Here are some ways to do that:

· From Administer, check for new users in user management.

· Run email addresses through a service like the Link: Stop Forum Spam search page.

· If no hits, but still suspicious, based on nick, state or zip entry, run IP through SFS advanced search.
· If still suspicious, but no hits, use the Google search from the SFS search page. Often reveals information helpful to decide.

· May cross match what new user claims to be their zip code against the location of actual zip code. (I've had foreign users claim their state as NY, with a CA zip code).

· May run IP address through an IP lookup to find server location: Link: IP Lookup

· Bear in mind that many spammers and backlinkers disguise their origin.

So, by the time I've gone through this process, I have a pretty fair basis for a decision whether or not to allow a new registrant access to the site, or to leave them blocked.


Link: Stop Forum Spam Freebies

Link: Idea to stop spam bots

Link: Botscout code for servers

Link: BotScout

Link: GearHack's List of Tools & Related Links: Well worth a visit!

Link: Search engine SPAM detector

Link: Mollom