Automating responses to email challenges

Richard Clayton, February 2006

Challenge-Response is one of the spam-fighting ideas that is part of the problem, rather than part of the solution.

The idea is simple enough. When an unsolicited email arrives it is held in a pending area and a "challenge" email is automatically sent to the sender to ask if the email is spam or not. The system designers expect that "real people" sending "real email" will respond. Their email can be released from the pending area and delivered normally, and in future their email address can be whitelisted so that they are not challenged and their email is not delayed. Conversely, the expectation is that the spammers will find it too expensive to respond -- if indeed they receive and examine the replies at all. Eventually the unwanted spam will time-out and be discarded, never bothering the recipient at all.

The problem

Ten years ago, challenge-response systems worked moderately well. They were far from perfect -- and I always seemed to encounter them when I stayed on after hours at the office to carefully construct an email to a stranger -- and then found the challenge in my inbox the following morning; so my late night had not saved a day after all.

They do of course have a significant design problem, in that if two people are running challenge-response systems then they will never manage to talk to each other; but the systems were never so widely deployed for this to become a significant issue -- and a little standardisation could mitigate this.

Today, challenge-response systems work spectacularly badly.

The main problem is in the three little words in the description of where the challenge is sent. It is sent "to the sender". But spammers now routinely forge the sender of email -- and so the challenge doesn't go to the spammer at all -- but to some innocent third-party who, up to that point, was blissfully unaware of the spam run.

Recently that's been happening to me.

In fact, recently that's been happening to me a LOT.

And by "a LOT" I mean A LOT ... I currently get four thousand or so incoming emails a day which are reports of some kind to tell me that someone doesn't want to deliver some spam which they think comes from my domain.

I've been doing my best to deal with this automatically.

Unfortunately I'm one of those people who use a lot of email addresses, I'm amazon@ to Amazon, acme@ to Acme and so forth (and yes, I've read what Bruce Schneier had to say about this scheme). But I've forgotten quite what addresses I've used, so I can't just discard email to unusual or unknown addresses.

However, I can discard "bounces" (Delivery Status Notifications etc) which come from email addresses I don't use every day -- they are bound to be reports of undeliverable spam; and so if they have a "null" (<>) sender I can pick those out and refuse delivery. That cuts down the gratuitous reports of spam delivered to others to an almost manageable 500 or so emails a day.

This 500 comprises all the vacation messages ("I'm not in the office this week, here's the private phone numbers of all my colleagues"), the stupidly configured copies of PostFix and of course all of the challenge-response messages (because a lot of these have non-null senders -- for reasons that quite escapes me).

By the way: if you're suddenly feeling guilty about sending any of this stuff, then Spamlinks has a great deal of useful information about how to fix your system. They also have an excellent collection of links to material about challenge-response systems and other commentary about their nuisance and likely (in)effectiveness.

An economic analysis

Analysed from an economic point of view, the people running challenge-response systems are trying to dump their spam-filtering costs onto me. That's not very nice of them -- so I've decided I'm going to take a longer term view of the costs than they do. I'm investing my time now to make their systems ineffective, in the expectation that they will find their scheme doesn't work and so abandon it.

Shorn of the economics -- I'm answering the challenges. They then get spam delivered to them. They'll conclude the system doesn't work and will move on to the next snake-oil.

So they send me an email saying "are you a human?" I send them one back saying "yes!"

Of course this works better if everyone responds to challenges. So join the crowd. Respond similarly! Assert you are human too!!

Sounds fun -- and it was for a few days! Especially because some of the systems let you send a little message along with the response. I usually say something like "Your tedious challenge-response system sends junk to me whenever you receive spam. Turn it off!" (there's a length limit, so I can't write an essay like this one). This definitely adds to the experience because sometimes people write back (not very inventively with the invective) and discuss what I've done. I like to think my longer explanations have educated them as to their supporting role in the spam problem, because they tend not to write again.

Automating responses

Anyway, to shorten a long and rambling story, after a few days it became rather tedious doing everything manually, and so I started to look at ways to automate the responses.

For quite a number of the challenge-response systems that people are foolish enough to use, it is really easy to respond automatically. All you need to do is to send back an email quoting a long and cryptographic looking string in the Subject. So a little Perl to pick out the emails and you're away.

However, some of the systems want you to really really really prove you're a human. Here the standard mechanism is to direct you to a web page and then ask you to solve a CAPTCHA. (If you don't know what a CAPTCHA is then this website at Carnegie Mellon will tell you all that you need to know.)

Generally -- when CAPTCHAs are used for Challenge-Response systems -- you're shown an image and asked to type in the characters that it contains. A number of other people have been working on automatically solving these CAPTCHAs and showing that they are often rather weak. Worth looking at are:

OCR Research Team a Ukrainian group who break weak CAPTCHAs and offer to sell you robust systems that they would find strong.
Brains-N-Brawn How 2 0wnz blogz using neural nets.
Sam Hocevar Currently has twelve broken schemes and a number of others in progress.

By the way, there's an urban myth that spammers offer "free porn" for solving CAPTCHAs (exploiting the complete lack of linkage between the puzzle and the reward). This story appears to originate in an interview with Luis Von Ahn, one of the Carnegie Mellon researchers. This was then recycled onto BoingBoing, from there to Slashdot and it is presently stated as fact in Wikipedia. Luis tells me (2006-02-10) that "I have never seen this actually happen". He says that he's been discussing the possibility in public since 2001 and although he's been told it was actually happening, he's never seen any proof. He now says that it is "allegedly happening", whereas I'd go further and say that it's almost certainly not -- why should any spammer imagine that a porn user could type answers in accurately with just one hand?

These broken CAPTCHA schemes aside -- many of the systems being used to detect humans in challenge-response systems still pose significant difficulties. There's interesting distortions, the characters are not in fixed positions (this is vital because Chellapilla et al demonstrated at last year's CEAS Conference that computers are better than humans at dealing with the distortions! leaving the only challenge as the glyph separation), and there's often some interesting backgrounds which make it hard to know quite what's a character.

My own CAPTCHA-breaking automation is a work-in-progress of which I shall write more here later :-)

However, one of the schemes that is especially noteworthy -- because it provides an impressive-looking CAPTCHA, that turns out to be completely trivial to deal with -- is the one that EarthLink uses. I've a whole web page devoted to what they did wrong. So I suggest you continue reading there.

Return to Home Page

last modified 11 FEB 2006 -- http://www.cl.cam.ac.uk/~rnc1/cr/index.html