Strategy to distribute mail with procmail

I think this is more appropriate here than in Applications.

I am trying to figure out a reliable way to distribute mail to different users with procmail. This is the problem:

  1. Setup: openSUSE 11.1 with sendmail, crm114, procmail on a mailserver connected to the world.

  2. What it does: receives mail for several virtual domains. Each incoming mail is delivered to a special user (just one and always the same) who will feed it to procmail. procmail in turn pipes the mail through crm114 which is acting as a trainable spam detector. According to the classification the mail message will go into the spam bucket or - for unsure classifications - a copy goes to the admin for visual inspection. Sorted mails will be used to train the filter. It is essential that crm114 sees a representative sample from all mails. For this reason every mail is distributed initially to the special user who owns and maintains the crm114 database.

  3. Last step: Each mail which passed the spam threshold must be delivered to the recipient users with procmail. Here comes the problem: Different users may appear in any one of the headers in any combination. We may have:

To: bob@dom.tld, sue@dom.tld, archie@hotmail.com
Cc: ann@dom.tld

and when a recipient is in an Lcc: or Bcc: he will only appear in the topmost Received: header line.

Any procmail receipe like:

:0
* ^(To|Cc):.*(sue|susan)@dom\.tld
! sue@localhost

will fail, because it is a consuming receipe and any other users appearing in one of the headers will not receive the message. Do you have any ideas how to handle this situation with procmail?

On 2010-07-06 12:36 GMT vodoo wrote:

> 2. What it does: receives mail for several virtual domains. Each
> incoming mail is delivered to a special user (just one and always the
> same) who will feed it to procmail. procmail in turn pipes the mail
> through crm114 which is acting as a trainable spam detector. According
> to the classification the mail message will go into the spam bucket
> or - for unsure classifications - a copy goes to the admin for visual
> inspection. Sorted mails will be used to train the filter. It is
> essential that crm114 sees a representative sample from all mails. For
> this reason every mail is distributed initially to the special user
> who owns and maintains the crm114 database.

This is a complex setup. Very.

A better approach is that the MTA feeds mails to the spam filter tool,
which in turn, feeds the mail back to the MTA for delivery. The MTA
knows how to send mail to multiple recipients, which is your problem
with procmail.

This can be easily done in openSUSE out of the box with postfix and
amavis-new, but I’m sure that sendmail can do those tricks as well. But
I have forgotten all I knew about sendmail (which wasn’t much, anyway).


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

This is a complex setup. Very.

Indeed, it is. In fact the initial description of my setup was simplified. But once everything is set up it runs in the background and offers some big advantages. The bulk of tagged spam will never reach the user. And sendmail works together with the j-chkmail milter which will do a lot of independent tests (including dnsbl tests). This allows me to pre-sort questionable mail with procmail into several buckets as ‘probable ham’ or ‘probable spam’, give the admin a copy of these for inspection, and the time required to manually review these mails is greatly reduced.

The drawback is that I have to distribute all mails with procmail as well. My plan is to have these rules for every user:

##################
# Mail for joe doe
##################

:0 c
* ^(To|Cc|Bcc|Resent-To|Apparently-To):.*(joe|joe\.doe)@(virtdom1|virtdom2)
! joe@localhost

:0 cE
* ^Received:.*for\ <(joe|joe\.doe)@(virtdom1|virtdom2)
! joe@localhost

The second rule will only be considered when the first one was no match. I only distribute copies; any additional local recipients will get their copy as well. And finally at the end every mail ends in a big admin folder:

# consuming rule for all mails:
:0:
IN.backup

I just wonder if this is a valid approch. There may be better ways. Thanks for helping me think or kicking my butt to drive me in the right direction. Any ideas or opinions are welcome.