Dumb Submitter Bots
One of the most common spam bots that crawl the web is the one I call the "dumb submitter bot." These bots seek out web forms on pages, fill every editable field full of spam, and submit the form. They don't care what the form was for, or where it goes.Because of dumb submitter bots, every web form from a home-made guestbook to a "contact the webmaster" form is vulnerable to being pumped full of spam and submitted.
I've found a clever trick that stops virtually all dumb submitter bots (when I say "virtually all", it means that, since implementing this system, none of my forms have been spammed by a bot. Before I implemented this, my homemade guestbook form was constantly getting spammed).
How the Bots Work
The first step in learning how to defeat a dumb submitter bot is to understand how it works. A submitter bot crawls the web, like a search engine does, but pages aren't what they're after. They're looking for HTML forms. If the bot stumbles upon a guestbook, it sees the HTML form to submit a new entry. It then fills every editable field with spam. This tends to be the pattern:
- Text fields get filled up with links to spam sites.
- Checkboxes usually get checked.
- Radio buttons and select boxes usually get submitted with their default value.
Trapping the Bots
The most effective method I've found to trap the bots:Adding Trap Fields
For example, say you have a guestbook script on your site. You only have two fields: one for their name, and one for their message. You have a problem with spam bots posting messages full of spam to your guestbook. You implement a trap for them.Just create some fake form fields to trap bots, which aren't visible to human users because they're rendered invisible.
<form name="guestbook" action="/cgi-bin/guestbook.pl" method="post"> Your name:<br> <input type="text" name="name" value=""><p> Your message:<br> <textarea cols="40" rows="6" name="message"></textarea><p> <input type="submit" value="Submit Message"> <!-- The following code is invisible to humans and contains some trap text fields --> <div style="display: none"> If you can read this, don't touch the following text fields.<br> <input type="text" name="address" value="http://"><br> <input type="text" name="contact" value=""><br> <textarea cols="40" rows="6" name="comment"></textarea> </div> <!-- End spam bot trap -->
div
with a style attribute
display: none
, the contents of that div
are rendered
invisible by a web browser. Spam bots, however, only see the HTML code and don't
make any sense of that and read on and find our trap text fields.
Note that I included a bit of text inside the hidden div
instructing
the user not to touch the fields. In the off chance that your user doesn't have
a CSS-enabled browser, they'll see the hidden fields. Leave a note and tell them
to keep their hands off it.
Our form has two legitimate fields: name
and message
.
The trap fields are named address, contact, and comment
. I usually
name the trap fields by "tempting" names like this (spam bots like to seek out
guestbooks and forums in particular because of the likelihood somebody will see
the spam that the bot posts. Naming fields to look like this is extra bait).
Note: I made one of the trap fields (address
)
have a default value of "http://". This way, you can verify that your trap
fields were submitted, and that they haven't been modified from their
original values.
Validating the Trap
To finish our trap, some server-side work is needed. In your CGI script that the guestbook form submits to, you need to validate that your trap fields have been delivered unharmed. Here's an example in Perl that goes along with our guestbook script:#!/usr/bin/perl -w use CGI; my $q = new CGI; # Get the legitimate fields' values. my $name = $q->param('name'); my $message = $q->param('message'); # Get the trap fields' values. my $trap_address = $q->param('address'); my $trap_contact = $q->param('contact'); my $trap_comment = $q->param('comment'); # If the traps were tampered with, don't post the guestbook entry. if ($trap_address ne "http://" || $trap_contact ne "" || $trap_comment ne "") { # Show them an error page. print "Content-Type: text/html\n\n"; print "<h1>400 Forbidden</h1>\n"; exit(0); } # Save the guestbook entry. open (LOG, ">>guestbook.txt"); print LOG "$name|$message\n"; close (LOG); # Redirect them to a thankyou page. print "Location: /thankyou.html\n\n"; exit(0);
Summary
Dumb submitters will just pump your web forms full of spam and post them. By giving them some trap fields to fill in and validating them on the server side when they submit the form, you can quickly and easily tell that the submission came from a spam bot and not a real human.Will bots ever get around this? I doubt it. For a spam bot to realize that the trap fields are actually hidden, they would need to be able to render HTML and style sheets. They would then need to compare what's visible on the page with the underlying HTML code and figure out which forms they need to leave alone. And with the infinite possibilities of HTML syntax, this doesn't seem likely to happen.
Other Applications
In addition to protecting every form on Cuvou.com with this, I also implemented it into my YaBB forum on RiveScript.com.YaBB forums (and other popular forum software) get frequented by a different type of spam bot. Dumb submitter bots aren't sophisticated enough to post spam on a forum (which usually requires registration of a username). Instead, a more specialized type of bot is needed for a specific type of web forum.
Even then, this technique can keep the specialized bots off our forums too: remember how we had a trap field with a default value? A specialized bot won't know about our extra fields, since they aren't standard for the particular type of forum, and won't submit them with the rest of the form. We can see if the field that should have a default value is in fact empty, and know it was a spam bot that way, too.
~Kirsle
2008/09/01