Computer Laboratory

Mail processing

Tools are available for the Lab mail user to write really rather complex instructions for the mail system to interpret before the mail is delivered. This document shows what you can do, and how to do things people commonly want to do.

There are, of course, things you cannot do. A signficant one is that you may not “bounce” mail (i.e., refuse the connection) — that is something only the system itself is allowed.

The filter language

The filter language allows you to write simple linear scripts, with conditional discrimination based on the content of the message (and possibly the external context). The filter resides in the same .forward file that is used for simple forwarding. Obviously the mail system needs to distinguish the two sorts of file, and so a filter file must start with the characters:

# Exim filter

which has the disadvantage that it doesn’t look remarkable enough and is prone to getting deleted. So you are recommended to write:

# Exim filter <<-- do NOT delete this line

(anything on the line after “exim filter” is treated as comment, and the words may be in upper- or lowercase.)

You are not recommended to read the filter language specification as an introductory text. Rather, read the rest of this document for an outline of the language and a few useful examples, and then use the specification as a reference guide.

The useful commands in the filter language

The filter language has a simple structure; you typically use an extended conditional tree to discriminate between various types of message, and then perform some action at each leaf node of that tree.

The conditional tree is made up of if, then, elif (meaning “else+if”), else and endif commands. So the general structure of a tree is

if ( condition )
then
  action
elif ( condition )
  action
else
  action
endif

The conditions are built up from relations between variables and strings or numbers; elif is an abbreviation for else if, and endif ends a conditional command group. The string relations may be “is”, “contains”, “matches”, “is not”, “does not contain” or “does not match”. A variable is an element of the message, or some other important feature of what’s going on. It may be

  • some part of the header, such as “$h_from” or “$h_subject”,
  • a body-part, such as “$message_body” (the first few thousand characters of the message) or “$message_body_end (the last few thousand characters of the message),
  • a statistic, such as “$message_body_size,
  • a derived value, such as “$home” (the user’s home directory, of course), “$recipients” (the set of people who will receive this message), and so on.
  • a script variable, such as “delivered”, which records whether anything has actually delivered the message in the script-so-far.

In the case of “matches” and “does not match”, the thing to match against is a regular expression, using a pretty close match to Perl language syntax. You can test your regular expressions using the command pcretest:

$ pcretest
PCRE version 3.9 02-Jan-2002

  re> /.*\.cl\.cam\.ac\.uk/
data> jim@shep.cl.cam.ac.uk
 0: jim@shep.cl.cam.ac.uk
data> 
  re> /(.*)@.*\.cl\.cam\.ac\.uk/
data> jim@shep.cl.cam.ac.uk
 0: jim@shep.cl.cam.ac.uk
 1: jim

Note that jim@shep.cl.cam.ac.uk is not a valid mail address…

Discriminating between messages

Useful discriminators are:

  • Who sent the message — $h_from and $h_sender
  • Who the message was sent to — $h_to and $h_cc
  • Whether this mail was directly addressed to you — builtin condition “personal” detects if the mail was addressed to you, or was one you sent and copied to yourself.
  • The subject of the message — $h_subject
  • The size of the message — $message_body_size

The actions you can take

There are a small number of basic actions you may take, having identified a message class:

save
saves a message to a file. This is in fact the action the mailer takes by default, saving a message into your incoming mail file by the equivalent of

save $home/.mail

The save command marks the message as delivered (see unseen, below). A common way to end a filter script is the command

if not delivered then save .mail endif

meaning “if all else fails, deliver to the ‘default’ place”.

pipe
passes the message to an “approved” command (commands that are available are specifically selected: don’t try random Unix commands that might be helpful).

The pipe command marks the message as delivered (see unseen, below).

rcvstore
is a command you may pipe into, which puts the message as into a folder. The process performed by the mh inc command is equivalent to doing

pipe "rcvstore +inbox"

at the delivery of each message.

The message folder you want to rcvstore mail into must exist beforehand; if it doesn’t, mail may be lost (or, if you’re lucky, merely delayed). See Checking the filter, below.

seen
marks the message as delivered, without doing anything. Uses for this facility are few and far between; the sequence

seen finish

acts the same (abandoning the message) as does

finish

on its own.

unseen
modifies a command to suppress the “mark as delivered” action. So neither of

unseen pipe "rcvstore +foo"

or

unseen save .bar-mail

will mark the message as delivered.

seen
marks the message as having been delivered. There is probably no value in doing this; if you want to abandon a message, merely issue the command finish (see below), rather than falling through and hoping the system will believe you’ve not seen it.
finish
stops processing. You may do this at the end of a condition; an implicit finish is inserted at the end of the filter file if you’ve not put one there yourself.

Common cases

Disposing of spam

External mail coming to the department passes through a computing service ‘mail hub’, which checks for (and removes) viruses, and runs the message through a spam-detector. The spam detector adds header information about its reaction to the message: here’s one from a recent bit of rubbish:

X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
X-Cam-AntiVirus: No virus found
X-Cam-SpamDetails: scanned, SpamAssassin (score=8.7, FORGED_YAHOO_RCVD 2.29,
        HABEAS_HIL 4.00, HTML_60_70 0.10, HTML_MESSAGE 0.10,
        MIME_HTML_ONLY 0.10, RAZOR2_CHECK 2.06)
X-Cam-SpamScore: ssssssss

The URL on the first line tells you about the scanner, what it does, and what it’s protecting you against. The second line reports that there was no detectable virus in the mail. The third (and fourth and fifth) line(s) tell you what the spam scanner thought of the mail: this isn’t terribly enlightening, unless you’re working on the scanner itself — the important thing is the score, which in this case is 8.7. The final line is that value, coded so as to be easily spotted by a mail filter; there are 8 “s”s there. This particular mail was trapped by a statement:

if ( $h_X-Cam-SpamScore contains "sssssss" )
then
  pipe "rcvstore +spam"
  finish
endif

The condition mentions 7 “s”s, and there were 8 in the message, so the condition worked. In fact, mail with scores less than 7 are pretty frequently spam, but once you get down to really low scores, you’re in danger of rejecting legitimate mail. A common way round this is:

if ( $h_X-Cam-SpamScore contains "sssss" )
then
  save .mail.spam
  finish
elif ( $h_X-Cam-SpamScore contains "sss" )
then
  pipe "rcvstore +spam-suspect"
  finish
endif

The first condition spots mail with spam score 5 or greater, and simply saves it to a file. The second condition spots mail with a score of 3 or 4, and puts it in a folder “spam-suspect”; you should scan the folder regularly, to check that important mail is not being missed. (The scan can be pretty straightforward: it’s usually possible to tell that mail is significant on the basis of its author — who it’s from — and its subject.)

A folder for a mailing list

If you subscribe to a mailing list that you regard as background reading, only, you may wish to move messages to the list direct to a folder. (There are of course other reasons you might care to do this.)

Some mailing lists generously put a tag in every subject line:

Subject: [sibelius-list] Spot the Difference

or

Subject: Re: [sibelius-list] Spot the Difference

In such a case, we use a match on the subject line:

if   ($h_subject  contains "[sibelius-list]")
then pipe "rcvstore +sibelius"
     finish
endif

Another common discriminator is the sender, or the address the mail was sent to:

if ($h_from:$h_sender matches "latex-l@listserv" or
    $h_to:$h_cc matches "latex-team@latex-project.org")
then pipe "rcvstore +latex3"
     finish
endif

In this case, two different mailing lists are going to end up in the same folder.

Checking the filter

Always check your mail filter before you install it. As a general rule, the best bet is to write the filter iteratively, and to run checks on the changes. There are two stages to each check:

  • Ensure that every folder mentioned exists, and that the mail system can see it (see section General precautions).
  • Similarly, care is needed when you use the save command; always ensure the file exists (as an empty file) before you enable the filter. (The mail system will create your $HOME/.mail file for you, before you even arrive, but it won’t create a file in any other directory.)
  • Check the syntax of the filter with the cl-ckfilter tool. The tool checks the syntax of the filter file in $HOME/.forward-new, and reports some details of what it would do in a “normal” case.
$ cl-ckfilter
/usr/bin/ckfilter -q /homes/rf10/.forward-new gave: 0
Return-path copied from sender
Sender      = rf10@cl.cam.ac.uk
Recipient   = rf10@cl.cam.ac.uk
Testing Exim filter file "/homes/rf10/.forward-new"

Unseen save message to: /anfs/bigdisc/rf10/Mail/save-mail
Unseen deliver message to: robin.fairbairns@gmail.com
Filtering did not set up a significant delivery.
Normal delivery will occur.
$

Note that the line “Filtering did not set up a significant delivery.” is OK; the next line “Normal delivery will occur.” is to be read as “therefore …”. (The filtering message has an implicit command:

  if not delivered
  then save .mail
  endif

at its end.)

Once the check is complete (not reporting errors), it’s safe to:

cp ~/.forward ~/.forward-save
cp ~/.forward-new ~/.forward

And finally, get someone else send you a message. If it bounces, copy .forward_save back to .forward as soon as possible!

It’s easy to imagine that this testing verges on the paranoiac: however, when the alternative is the potential seriously to delay mail, or even to lose it, thorough testing is well worth while.

General precautions

Something that’s not always obvious to people, is that mail delivery happens on a central machine that runs the mail server. Even if you always use the same machine, the mail delivery system can’t touch disc space on that machine. So filter instructions like:

save /local/scratch/gsm10/.mail-suspect-spam

will cause mail to back up in the server, and you’ll eventually hear from an upset systems administrator. The mail server machine just can’t see your attractive looking scratch space. A reasonable alternative is to apply for an allocation on /anfs/bigdisc, and to save your spam there.

The same problem may arise from private servers. Some research groups run /usr/groups directories on group servers; some scratch and other machine-local directories are available via NFS (which in effect makes such machines private servers, too). Save mail to disc on non-central servers such as these, and once again, you’re in danger of systems administrator enragement — systems administrators reluctantly accept the blame when a central system goes down, but when you crash your own workstation and mail backs up, there’s no (recognised) excuse.