Solving Spam, Fussy Logic Style

By | 2012-04-16

Here’s the thing about spammers: they send millions of emails. They are able to do so because the marginal cost of sending is very small. I have heard it said that if they get one hit in a million, then they can be in profit. Some fools repeatedly suggest that a charge per email is the solution (some even want it to be a tax — spit). I say “fools” because there is no practical way to do it: there is no universal payment system (until Bitcoin rules the world); no centralised email system which you would pay to; and besides why punish all the non-spammers for the sins of the spammers?

My solution is an idea I’ve had for a while. It’s a modification of the method bitcoin uses to secure the block chain. In short: proof of work used to make writing emails/commenting on blogs expensive for spammers.

The proof of work system for bitcoin works by calculating a hash of some data plus a nonce (a use once number, whose only purpose is to change the output of the hash). Bitcoin decides on a threshold for this hash and then constantly adjusts the nonce until it calculates a hash that is lower. Obviously, the lower you set the threshold the fewer “acceptable” answers there are in the search space and the harder an acceptable nonce was to find.

Here is my solution then for email: an extra two headers are added to emails for every “to” address (it has to be per to-address as we want the number of recipients to be the limiting factor). A nonce field and a proof-of-work hash. At the moment you press “send”, the hash is calculated over a subset of the email headers, subject, date, to, and from; the email body; and of course the nonce. The sender can set their client to use a particular amount of work to send the email — probably measured in time. Let’s say five seconds per recipient. Its job is to find the lowest possible hash in that time. For those of us who send non-spam messages, we’ll barely even notice the computation time needed. For a spammer, sending to a million recipients, it will take 5.78 days of computation to add appropriate proofs-of-work.

Your email client, upon receipt of a message, confirms the proof-of-work hash and applies whatever threshold of acceptability you want. You can decide that you will only accept proofs-of-work of one second of work on a quad-core computer, say.

It’s all optional: you can choose not to add these proofs-of-work; but the recipient can choose not to accept emails without them; or more likely their spam detector (which is now much simpler than an artificial intelligence, Bayesian semantic content classifier) will just drop it straight in the bin.

The idea would be the same for web forums and blog comments. The difference would be that the hashing is done in the browser using javascript (or as the idea takes off, by a clever high-efficiency browser add-on). A little meter above the comment could show you the current proof-of-work level as you type, each letter would of course reset the meter. A genuine user will have no problem waiting five seconds to post a message; a spammer cannot dedicate enough computing resources to hash the massive quantity of spam they must send to make it worth their while (you could even hide the calculation time behind a comment preview display, making the proof-of-work time completely transparent). Obviously the forum software would check the hash and reject any that failed its minimum proof-of-work threshold.

Key advantages

  • We can get rid of the need for blog accounts; people who turn off “anonymous” posting to prevent spamming would no longer have to do this
  • CAPTCHAs could be chucked away
  • Spam would vanish; the dedication of computing resource would be too much to make spam comments practical.
  • People could leave the proof-of-work calculating as long as they liked to establish whatever level of non-spamminess they wanted

Update from comments:

It’s already been done (for email); see HashCash.

Update update.

Having read about hashcash, my solution is better. They require a database of previously used valid stamps to make reuse of the stamp impossible. My method makes each email self-consistent, but with a different “stamp” from every other email.

Leave a Reply