Whitelist, Blacklist, Greylist

I recently got into a spirited discussion about Akismet. What is Akismet?

When a new comment, trackback, or pingback comes to your blog it is submitted to the Akismet web service which runs hundreds of tests on the comment and returns a thumbs up or thumbs down.

Akismet is awfully coy about the "tests" they run to distinguish between spam and everything else. I believe Akismet is essentially the same as the old mt-blacklist plugin I use to block trackback spam. But instead of manually entering blacklist terms, Akismet harnesses the collective knowledge of the intarwebs. As soon as one person blacklists something, it's blacklisted for the entire Akismet community. And it definitely works. It's so effective that some people use it as their only protection against spam comments and trackbacks. I think this is very unwise.

First of all, blacklists aren't a panacea. They have their pros and cons. Just ask Matt Mullenweg, the author of Akismet. He recently left this comment on a blog post:

Unfortunately, the DNS realtime blacklists cause an unusually high false positive rate, which is why we don't use them anymore.

Interesting. And if you're going to keep a blacklist, you might as well keep a greylist and whitelist, too:

whitelist, greylist, blacklist

These three lists have been around as long as spam itself:

   Items on the Blacklist are never allowed through. They are either held in a moderation queue, or deleted.

   Items on the Whitelist are always allowed through.

   Items on the Greylist are held for human moderation.

Akismet also offers a moderation queue, so it has aspects of a greylist as well. Instead of spending time maintaining a blacklist, you spend time staring down a greylist moderation queue. I'm not so sure that's an improvement. If you consider Akismet a success because you ignore the moderation queue entirely, have you really succeeded?

It's also quite possible to use whitelist attacks on blacklists, where spammers use innocent and legitimate URLs in their spam. I've had a few of these myself. Even if you don't have a whitelist, attacks like this greatly reduce the effectiveness of a blacklist-- legitimate domains end up blacklisted through collateral damage.

But let's forget, for a moment, all the problems I just described with blacklists, whitelists, and greylists. The core problem is relying on a single method of defense against spam. Relying only on Akismet means:

  1. You've added an external dependency to your website. I hate dependencies, and I always strive to keep the number of dependencies I accept to an absolute minimum.
  2. If Akismet goes down, you either get inundated by spam while the floodgates are open, or nobody can comment/trackback. Neither scenario is desirable.
  3. I get 75 spam trackbacks per hour on this blog. Multiply that the number of blogs on the internet, and you get an astronomically large number. Why should Akismet have to check every single one of those? Does Akismet have the capacity to scale that large? And is it reasonable to expect them to?

I can understand making the choice to use Akismet exclusively for trackbacks, where our options for combating spam are severely limited. But for comments, abandoning CAPTCHA in favor of Akismet is unforgivable. Engtech explains some of the problems with this approach in a recent comment:

[Akismet] has been pretty effective, but there's been a few interesting cases:

  • compliment spam ("great post!" with website field linking to their p-rn/adsense splog site)
  • only attacking blogs that appear to still have the default post as the first post -- less likely to monitor spam.
  • one p-rn spammer who finds political/pop culture keywords in a post and inserts human crafted messages. Like: "Some people say Matt Damon isn't that good of an actor, I really liked him in Talented Mr. Ripley" whenever it finds a post with "Matt Damon"

The one thing it has absolutely sucked at is spammers-to-be. People who are just testing out spam generation algorithms that have no payload. So you'll get random gibberish from an IP address and it will take a few days for Akismet to learn.

Hearing this pains me greatly. All the of the above could have been completely eliminated by using both methods: CAPTCHA to validate that it's a human, then Akismet to validate that it's not human-entered spam.

Akismet is a fine addition to our anti-spamming toolkit. But that doesn't mean it's a good idea to outsource your entire anti-spam effort to a single website, either. Anti-spam security starts at home. For best results, use defense in depth and combine local anti-spam measures, such as CAPTCHA, with Akismet as a backup.

Read more

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood · · Comments

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven't any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don't stop there - read the Story of Your Life novella it was based on

By Jeff Atwood · · Comments

I Fight For The Users

If you haven't been able to keep up with my blistering pace of one blog post per year, I don't blame you. There's a lot going on right now. It's a busy time. But let's pause and take a moment

By Jeff Atwood · · Comments

The 2030 Self-Driving Car Bet

It's my honor to announce that John Carmack and I have initiated a friendly bet of $10,000* to the 501(c)(3) charity of the winner’s choice: By January 1st, 2030, completely autonomous self-driving cars meeting SAE J3016 level 5 will be commercially available for passenger

By Jeff Atwood · · Comments