Dumb Submitter Bots

One of the most common spam bots that crawl the web is the one I call the "dumb submitter bot." These bots seek out web forms on pages, fill every editable field full of spam, and submit the form. They don't care what the form was for, or where it goes.

Because of dumb submitter bots, every web form from a home-made guestbook to a "contact the webmaster" form is vulnerable to being pumped full of spam and submitted.

I've found a clever trick that stops virtually all dumb submitter bots (when I say "virtually all", it means that, since implementing this system, none of my forms have been spammed by a bot. Before I implemented this, my homemade guestbook form was constantly getting spammed).

How the Bots Work

The first step in learning how to defeat a dumb submitter bot is to understand how it works. A submitter bot crawls the web, like a search engine does, but pages aren't what they're after. They're looking for HTML forms. If the bot stumbles upon a guestbook, it sees the HTML form to submit a new entry. It then fills every editable field with spam. This tends to be the pattern:

Text fields get filled up with links to spam sites.
Checkboxes usually get checked.
Radio buttons and select boxes usually get submitted with their default value.

Trapping the Bots

The most effective method I've found to trap the bots:

How to trap the bots: Use hidden input fields that humans won't see, but bots will. If these fields are modified, then you know a bot has spammed your form.

Adding Trap Fields

For example, say you have a guestbook script on your site. You only have two fields: one for their name, and one for their message. You have a problem with spam bots posting messages full of spam to your guestbook. You implement a trap for them.

Just create some fake form fields to trap bots, which aren't visible to human users because they're rendered invisible.

HTML Code:

<form name="guestbook" action="/cgi-bin/guestbook.pl" method="post">

Your name:<br>
<input type="text" name="name" value=""><p>

Your message:<br>
<textarea cols="40" rows="6" name="message"></textarea><p>

<input type="submit" value="Submit Message">

<!-- The following code is invisible to humans and
     contains some trap text fields                -->

<div style="display: none">
If you can read this, don't touch the following text fields.<br>

<input type="text" name="address" value="http://"><br>
<input type="text" name="contact" value=""><br>
<textarea cols="40" rows="6" name="comment"></textarea>
</div>

<!-- End spam bot trap -->

Because the trap fields are inside of a div with a style attribute display: none, the contents of that div are rendered invisible by a web browser. Spam bots, however, only see the HTML code and don't make any sense of that and read on and find our trap text fields.

Note that I included a bit of text inside the hidden div instructing the user not to touch the fields. In the off chance that your user doesn't have a CSS-enabled browser, they'll see the hidden fields. Leave a note and tell them to keep their hands off it.

Our form has two legitimate fields: name and message. The trap fields are named address, contact, and comment. I usually name the trap fields by "tempting" names like this (spam bots like to seek out guestbooks and forums in particular because of the likelihood somebody will see the spam that the bot posts. Naming fields to look like this is extra bait).

Note: I made one of the trap fields (address) have a default value of "http://". This way, you can verify that your trap fields were submitted, and that they haven't been modified from their original values.

Validating the Trap

To finish our trap, some server-side work is needed. In your CGI script that the guestbook form submits to, you need to validate that your trap fields have been delivered unharmed. Here's an example in Perl that goes along with our guestbook script:

Perl Code:

#!/usr/bin/perl -w

use CGI;
my $q = new CGI;

# Get the legitimate fields' values.
my $name = $q->param('name');
my $message = $q->param('message');

# Get the trap fields' values.
my $trap_address = $q->param('address');
my $trap_contact = $q->param('contact');
my $trap_comment = $q->param('comment');

# If the traps were tampered with, don't post the guestbook entry.
if ($trap_address ne "http://" || $trap_contact ne "" || $trap_comment ne "") {
	# Show them an error page.
	print "Content-Type: text/html\n\n";
	print "<h1>400 Forbidden</h1>\n";
	exit(0);
}

# Save the guestbook entry.
open (LOG, ">>guestbook.txt");
print LOG "$name|$message\n";
close (LOG);

# Redirect them to a thankyou page.
print "Location: /thankyou.html\n\n";
exit(0);

Now, if the form gets submitted, and our trap field "address" does not equal its default value of "http://", or if the fields "contact" or "comment" have anything in them at all, it will give them a nice "400 Forbidden" page and exit without saving the guestbook entry.

Summary

Dumb submitters will just pump your web forms full of spam and post them. By giving them some trap fields to fill in and validating them on the server side when they submit the form, you can quickly and easily tell that the submission came from a spam bot and not a real human.

Will bots ever get around this? I doubt it. For a spam bot to realize that the trap fields are actually hidden, they would need to be able to render HTML and style sheets. They would then need to compare what's visible on the page with the underlying HTML code and figure out which forms they need to leave alone. And with the infinite possibilities of HTML syntax, this doesn't seem likely to happen.

Other Applications

In addition to protecting every form on Cuvou.com with this, I also implemented it into my YaBB forum on RiveScript.com.

YaBB forums (and other popular forum software) get frequented by a different type of spam bot. Dumb submitter bots aren't sophisticated enough to post spam on a forum (which usually requires registration of a username). Instead, a more specialized type of bot is needed for a specific type of web forum.

Even then, this technique can keep the specialized bots off our forums too: remember how we had a trap field with a default value? A specialized bot won't know about our extra fields, since they aren't standard for the particular type of forum, and won't submit them with the rest of the form. We can see if the field that should have a default value is in fact empty, and know it was a spam bot that way, too.

~Kirsle
2008/09/01