[LinuxBrit]

Procmail

I highly recommend you check out my mutt page, and my .muttrc, as I use mutt with procmail, and it really is the best way to handle mail.

Procmail is a mail filtering program. Most MTA's (mail transfer agents) come pre-configured to use procmail if a ~/.procmailrc exists. The MTA hands mail over to procmail, which reads your .procmailrc and handles mail accordingly. At it's simplest, procmail can be used to put email into different folders depending on where it came from, an invaluable aid to reading mailing lists. But procmail can do much more. My procmailrc contains SPAM filtering, automatic correcting of broken emails, forwarding and logging.

Procmailrc's basically consist of a list of rules. Each rule is normally in the form of "Options (locking etc), Regexp match, Action". For each email, procmail will test it against each rule until one matches, and then process the action accordingly. Some actions can make copies of mail and allow the mail to pass on to other rules also, so you can make mail backups and archives easily.

"man procmail" can be daunting, but "man procmailex" is pretty useful. I'll try and talk you through a simple example procmailrc here to get you introduced to it.

First, we set up some variables.

PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/bash
VERBOSE=off
MAILDIR=$HOME/mail
DEFAULT=$MAILDIR/Inbox
LOGFILE=$HOME/.procmail.log
SPAM=SPAM
SPAMMERS=$HOME/.procmail/spammers

This is fairly self-explanatory. MAILDIR is the directory in which mail folders lie. DEFAULT is the default mailbox in which to dump mail which doesn't match any rules in the procmailrc. Here, I set SPAM to be a mailbox called SPAM under $HOME/Mail, but if you are confident that your SPAM filtering works, you can set SPAM=/dev/null to really get shot of it :-) The SPAMMERS file will be described shortly.

Now, here's a really useful rule, which I suggest you use if you subscribe to any mailing lists, and often have people CC: you and the list. This leads to duplicate mails, which are annoying. This rule spots duplicates and only allows the first to go through, the second gets sent to $HOME/Mail/duplicates. (Again, use /dev/null if confident it's working). Put this rule at the top of your .procmailrc, after the variables above are set.

:0 Whc: msgid.lock
| formail -D 16384 msgid.cache
:0 a:
duplicates

Now before we go on to sender matching and spam filtering, I usually have a few rules for cleaning up broken emails from bad MUAs. I won't recite them all, you can see them all in my .procmailrc, but here is an example:

# Correct wrong sig-dashes, ie add a space 
# for lines with only "--" in them:
# from: ^--$
# to:   ^-- $
:0 fBw
* ^--$
| sed -e 's/^--$/-- /'

Now, a little SPAM filtering. The first (and best) rule is a check against a list of known spammers. The file $SPAMMERS (see the variables above) contains a list (one per line) of email addresses, or part-addresses of known spammers. (eg fred@isp.com to block fred or aol.com to kill all mail from that domain). You add to this file yourself as you get spam. I have a macro in mutt (check my muttrc) so that when I hit 'S', the current mail is move to the SPAM folder, and a perl script is run to add the email addy to $SPAMMERS. Here's the rule for SPAMMERS:

:0:
* ? (formail -x From: -x Sender: -x Reply-To: -x Received: | fgrep -iqf $SPAMMERS)
$SPAM

I then do lots of other SPAM checks, see my procmailrc for all of them, but here's an example:

# Matching by header
# look for X-Advertisement header or 'advertisement' in the subject,
# accounting for possible sp. error.  "Nice" spammers use this header.
:0:
* ^X-Adverti[sz]ement:
| formail -A "X-SPAM-RULE: X-Advertisement in header" >> $SPAM
# Matching by subject (The D option forces case sensitivity)
:0 D:
* Subject:.*\<XXX\>
| formail -A "X-SPAM-RULE: XXX in subject" >> $SPAM
# Matching by content (the braces contain one literal tab and one space)
:0 B:
* ^[     ]*dear fr(ie|ei)nd(s)?
| formail -A "X-SPAM-RULE: dear friends opener" >> $SPAM

Note the destination for those rules, they go into $SPAM, but via formail, which adds a header - allowing you to debug/tune your SPAM ruleset later.

Now some forwarding, this forwards SMS mails to my SMS phone, and also copies it to the sms mailbox (the c option makes a copy):

:0 c
* ^TOsms@.*
! gilbertt@sms.genie.co.uk

:0 A
sms

Now you can match sender and actually file your mail away :-) The trick here is to use the best regexp you can, to make procmail run faster. Macros such as TO are best avoided (TO checks To: From: Cc: Bcc: etc, and is therefore slow). For mailing lists, look for a unique header from that list, good lists use the Sender: header to distinguish themselves. For example:

:0:
* ^Sender: linux-kernel-owner@vger.kernel.org
lists/kernel

It's best to order your rules by mail throughput, eg putting high traffic mailing lists at the top (like lkml) saves running lots of wasted regexps.

There is another method for processing mailing lists, which I use to speed up matching and automate the process. It saves adding each list to you subscribe to your .procmailrc and .muttrc etc. Instead, you detect lists and save into a folder with the name of the list. Keeps stuff nice and organised... Here's how:

:0:
* ^X-Mailing-List:[   ]<\/[^ >`']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

:0:
* ^X-Mailing-List:[   ]\/[^ `']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

:0:
* ^Sender:[   ]owner-\/[^ `']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

:0:   
* ^X-BeenThere:[  ]\/[^ `']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

:0:
* ^Delivered-To:[   ]mailing list \/[^ `']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

:0:
* ^X-Loop:[   ]\/[^ `']+
lists/`echo $MATCH | sed -e 's/[\/]/_/g' | tr A-Z a-z`

This matches all the mailing list software I come across. Mail from the kernel mailing list will go into lists/linux-kernel@vger.kernel.org, automatically. The sed script and odd matching rule is to prevent $MATCH being "../.bash_profile" or anything equally malicious. No more adding every new list into the rc file :-)

To make mutt work nicely with this setup, and be completely automatic too, I have this in my muttrc:

unlists *      # remove old entries first
lists `cd ~/mail && echo *`

unsubscribe *
subscribe `cd ~/mail/lists && echo *`

mailboxes !
mailboxes `for file in ~/mail/lists/*; do echo -n "+lists/$(basename $file) "; done`

Give it a shot, it may suit you...

My procmailrc doesn't leave any mails in the local spoolfile, it all goes into my home directory. My muttrc reflects this, as you can see if you take a look at it. I much prefer my email to be contained in one directory in $HOME.

If you wanna know how to do something in procmail, drop me a line. In the meantime, I'll put up more procmail info when I get the time...