Context
Since I've now entered the the world of blogging I wanted to start off proactively in what I've heard is a continuous fight against comment spam. With only one a handful of comments so far, its not out of urgency that I've put in safeguards, but simply out of curiosity. If I wasn't in the field of web development, I'd be looking for a door into security (only partly because I'd be able to don a white hat at work).
When researching countermeasures others have implemented, I found a lot of CAPTCHA conversations. Although it is clearly very effective and I like the tapping of human intellect that reCAPTCHA uses, it still seems an annoyance and accessibility issue for users - especially the almost impossible to read ones, even as a human, such as Google CAPTCHAs.
So if not CAPTCHAs, what other techniques are there? Well I stumbled across an insightful blog post "Stopping spambots with hashes and honeypots" by Ned Batchelder. He discusses a series of tricks that can be implemented and the discussion continues further in the comments made by his readers. I've implemented some of those thoughts to form a Spam Filter CakePHP behaviour and helper.
Please be aware that Ned Batchelder's article is two years old (I didn't notice until I'd already begun implementing), however a year ago sitepoint also mentions the techniques, so it will be interesting to see (or hear from yourself) whether this is still an effective approach.
To the point
The aim is to eliminate form spam, however elimination is a difficult task because any "black hatter" that really wants to target you will find a way (ironically for Developing in the Dark, they'd probably start by reading this blog post). However in a lot of cases in security, its just about being more secure than your neighbour.
The code and theory below aims to prevent automated spam - they don't prevent manual spam. To address manual spam I'd recommend reading this article by Jonathan Snook and a CakePHP behaviour implementation by Miles Johnson that use a point system to flag potential spam. If manual spam becomes an issue in the future, i'll definitely be looking to add those ideas to Developing in the Dark.
Get the code
The behaviour and helper has been packaged up. The attached archives contain a README that's, not surprisingly, worth reading - it gives a run down on implementing and using it.
Not to repeat the README file, however a quick usage example may help. The behaviour is not specific for "Comment" related models - it could be used on any models (ie, User signup).
Inserted in your model.
var $actsAs = array("SpamFilter");
Inserted in a form in the view/element.
echo $spamFilter->honeypot("Comment", "honeypot");
echo $spamFilter->spinner("Comment", "spinner");
Inserted in the controller to display different message for spam.
if ($this->Comment->save($this->data, array("validate" => true))) {
$this->Session->setFlash("Successfully added");
} else {
// if invalid fields, its a validation error, else assumed its a spam error
if($this->Comment->invalidFields()) {
$this->Session->setFlash("Required fields are invalid", "default", null, "error");
} else {
$this->Session->setFlash("Spam detected", "default", null, "error");
}
}
An example of the tmp/logs/spam.log file (email support is also included in the behaviour). Note: the "::1" is the IP address when testing locally.
2009-11-30 22:32:35 Spam: Comment spam detected from ip: ::1
Spam validation: the honeypot is full
Array
(
[post_id] => 1
[message] => Spam, Spam, Spam, baked beans, Spam, Spam, Spam and Spam
[name] => Spammer
[email] => spam@example.com
[web] => example.com
[honeypot] => asdasdasd
[spinner] => nRF08r4p0g9542N+6enb/iVWM0UkInsYmEgaQbONQQ8TiBX0G1dZmDKjMrdVALoaQoF4kx/91ue/KXnKMAeaETHkQg0ALVxI8+BacXX5hXA+QqHoAEbBWSjtpyvkV8hD
[modified] => 2009-11-30 22:32:35
[created] => 2009-11-30 22:32:35
)
Theory
A Spinner
A spinner is a hidden field that contains obfuscated items of information that are validated on submit. Information would usually include a timestamp, IP address, and a secret code.
My implementation of the spinner includes those three items which are json_encoded then encrypted so they cannot be tampered with when rendered in the form. On form submission, the obfuscation process is reversed and the following checks are made:
Time
- Checks submission time is not earlier than the render time.
- Checks the comment took longer than 3 seconds to write (you're a pro typer if you're submitting a valuable comment in less than 3 seconds)
- Checks the comment has been written within 36 hours (if you have the page open for longer than 36 hours, I appreciate how much you love my blog but I'd be worried that you're poisoning yourself with caffeine)
IP Address
- In case there is a server farm or botnet HTTP POSTing comments, the zombie PC IP address will most likely differ from the one that originally captured the form data.
Secret
- A random hash that should be changed every so often (ie, once a week would be good amount). It would force the black hatter to re-automate a form playback script.
Some of these measures could be more restrictive, particularly the time related ones (ie, 36 hours is probably a lot longer than needed), however with no spam coming through at the moment, I'd hate to be blocking real comments.
Honeypot
A honeypot has also been implemented (often referred to as reverse CAPTCHA). You basically add a form field that has no value and is hidden by either JavaScript or CSS. On submit, if there is any value in that field then you can assume its a spam attempt. It's important for the field itself not to have "type=hidden" because bots will usually avoid these.
If all that fails
If the above fails to effectively manage automated spam, a restriction on the number of comments from an IP address within a timespan (ie, no more than 5 in 10 mins) could be implemented. Also a cookie check, by inserting a value in the users cookie on render and checking it on submit, would also force the spambot to have cookie support which, from reading, appears to be a minority. You could also change the name of form fields every so often to a random hash to disrupt prerecorded scripts. Unfortunately though it seems that for popular blogs such as Jonathan Snook's there is still a need lock comments on old posts because comment spam is still finding a way in.
That's all folks
I have possibly been slightly over keen in writing a post about preventing spam without having experienced one spam comment to date. However it will be an interesting experiment and I will keep you posted if any automated spam attempts are detected. Let the battle begin.
Conversation
Have your say...