Extended spam filter: check_local 3.11 for sendmail 8.9

Look for changes and beta-versions in the file CHANGES (Last change Nov 27 13:46).

How to install

  1. Download check_local-3.11.tar.gz and 'zcat | tar xf -' it. (If the download fails, you may have and old version of this document from cache. Try "<shift><reload>".) No gzip? Download check_local-3.11.tar.
  2. If you want to use enhanced header-checking for Received-Lines (option _CHECK_HEADER_RECEIVED_) patch sendmail-8.9.?/src/headers.c (headers.c.patch). For longer header-fields enlarge MAXNAME and MAXATOM in sendmail-8.9.?/src/conf.h:
    #define MAXNAME        1024            /* max length of a name */
    #define MAXATOM        512             /* max atoms per address */
    
  3. If you want to use _SPAM_FRIENDS_ or _SPAM_HATERS_ in header-checking (option _HC_SWITCH_) install the map_storage as described in map_storage/README.
  4. Make sure your sendmail is compiled and linked with regex-support. Refer to sendmail-8.9.?/src/README for further information.
  5. After a patch rebuild sendmail (ie ./Build -c -f site.config.m4)
  6. Move check_local.m4 to sendmail-8.9.?/cf/hack
  7. edit your .mc-file (example below)
    ...
    dnl * some options *
    define(`_SPAM_HATERS_', `hash /etc/mail/db/spam_haters')dnl
    define(`_HC_SWITCH_')dnl
    define(`_REGEX_LOCALNUMS_')dnl
    define(`_CHECK_HEADER_RECEIVED_')dnl
    define(`_CHECK_HEADER_FROM_')dnl
    define(`_CHECK_MESSAGE_ID_')dnl
    HACK(check_local)dnl
    ...
    
  8. send bug-reports and/or patches to Jan.Krueger+map@unix-ag.uni-hannover.de

How does it work?

Using HACK(check_local), all the checks are done in the ruleset Local_check_rcpt. The rulesets check_mail and check_relay are overridden by local "$#OK rulesets". Basic_check_mail is called by Local_check_rcpt. For further information refer to the flow chart (FLOW) of Local_check_rcpt.

How to debug

For debugging start sendmail with option -d21.4 -bt. Define the client_addr and client_name and the MAIL FROM like shown below and call the ruleset with the recipient.
/usr/lib/sendmail -d21.4 -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>
> .D{client_name}spamhost.com
> .D{client_addr}1.2.3.4
> .Df<from@spamhost.com>
> Local_check_rcpt <rcpt@to>
See the script 'tester' for an idea of how to debug. If you want to use it, make your sendmail.cf with option _CL_TESTER_. This includes some small rulesets for the tests.

Flow control options

_RBL_ALL_
Lookup the RBL for every recipient, even spam_friends or non_haters. With _RBL_ALL_ a blacklisted IP in the RBL can't be overridden by an OK in the (relay_)access.db.

_REJECT_RELAY_FIRST_
Without this option, a MAIL FROM with RHS OK in the access.db can override a blacklisted relay. Since the MAIL FROM can be forged, you may want to give the rejected relay a higher priority than the whitelisted MAIL FROM with this option. See also the flow chart (FLOW).

Relaying options

_POPAUTH_
Lookup client-IP in "relay after popauth"-database and allow relaying through the system if found. The defaults "hash /etc/mail/popauth" can be overridden with a define(`_POPAUTH_', `dbtype /your/db/location')dnl. Further information and software for "relay after popauth" can be found at http://spam.abuse.net/tools/smPbS.html or http://www2.portal.ca/~cjs/computer/sendmail/poprelay.html or http://mail.cc.umanitoba.ca/drac/index.html.

_RELAY_MAIL_FROM_
Allow relaying based on the host part of the MAIL FROM given in the SMTP-dialog. This option improves the FEATURE(relay_local_from) and should not be used, because the MAIL FROM can be easily forged. Use _POPAUTH_ instead. The host parts must be listed with RHS RELAY in the access.db.

_RELAY_MAIL_FROM_DOMAIN_
Also accept domains with option _RELAY_MAIL_FROM_. _RELAY_MAIL_FROM_ is required.

_RELAY_MAIL_FROM_DB_
Use an extra database for _RELAY_MAIL_FROM_, not the access.db. The default "hash /etc/mail/relay_mail_from" can be overriden with a define().

Database options

_SPAM_FRIENDS_
Recipients in map /etc/mail/spam_friends get there mail unfiltered. For all the other recipients on the host the mail is filtered. This option is usefull, if you want to define some exeptions like postmaster@..., etc.
Use define(`_SPAM_FRIENDS_', `dbtype /path/to/database') for other places. The RHS of the map is ignored. The +option of the localpart is significant. While user@domain has it's mail filtered user+open@domain can be listed in the spam_friends. You can list +option in the database to have a general "magic key" which prevents mail from filtering (maybe usefull for mailinglists).

_SPAM_HATERS_
Opposite of _SPAM_FRIENDS_: Only recipients in map spam_haters get there mail filtered. All the other recipients on the host get unfiltered mail. +option in localpart can be used as "magic key" and is discarded before the lookup of the recipient.

_HEADER_ACCESS_DB_
Use an extra database (same format as access.db) for header_checking. The default file: /etc/mail/header_access can be overridden by the define(`_HEADER_ACCESS_DB_', `/path/to/header_access')dnl

_RELAY_ACCESS_DB_
Use an extra database for check_relay. It's usefull for rejecting email coming directly from a dialin. Default is /etc/mail/relay_access. (relay access.db knows RHS DISCARD/REJECT/OK or an error message like "550 no direct access from dialin")

Envelope options

_X_SPAM_ENVELOPE_(requires map_storage)
Add a header-field X-Spam-Envelope instead of rejecting the message. This can be used for later processing (ie with procmail).

_REGEX_LOCALNUMS_
Check the MAIL FROM for digits-only localparts from domains listet in /etc/mail/no_num_domains. Use define(`_REGEX_LOCALNUMS', `/path/to/file') for other places.

_REGEX_LOCALNUM_START_
Check the MAIL FROM for localparts starting with a digit from domains listed in /etc/mail/no_num_start_domains.

_CHECK_REGEX_
Check the MAIL FROM (canonified) against the given regular expression. E.g. define(`_CHECK_REGEX_', `^[^<]{10}[^<]+<domain\.com\.?>')

_MD2NAME_
Rewrite MAIL FROM: <> (Mailerdaemon) to MAIL FROM: < $n @ $&{client_name} >.

_CLIENT_MUST_RESOLVE_
If the client does not resolve in DNS or the entry is bogus, the mail is rejected. Temporary DNS failures cause temporary errors.

_MD5_MAGIC_KEY_ (requires map_storage)
Based on a compiled in secret in map_storage, every user can have its own md5-magic-key. The option enables the test of this key in the ruleset.

_ORBS_
The option adds support for the Open Relay Blocking Service ( http://www.dorkslayers.com/orbs/) FEATURE(rbl) is required for this option. If you want to use only the ORBS and not the RBL use FEATURE(rbl, `orbs.dorkslayers.com')dnl.

_DUL_
Lookup the client-IP in the Orca Dial-up User List (http://www.orca.bc.ca/dul/)

FEATURE(rbl)
is supported for all IP-lookups. (http://maps.vix.com/rbl/)

confREJECT_MSG
is supported for most errors.

The use of either _SPAM_FRIENDS_ (ie postmaster) or _SPAM_HATERS_ is recommended.

Header options

_CHECK_HEADER_RECEIVED_ (requires headers.c.patch)
Parse the Received:-lines of the header and check them against access.db

_HC_SWITCH_ (recommended) (requires map_storage)
The spam_friends or spam_hater information is stored into a storage. The check_header-rulesets read the storage and don't check the header fields if the recipient is a spam_friend, the client is listed RELAY/OK in (relay_)access.db or the MAIL FROM is listed RELAY/OK in access.db. (see "known Problems")

_X_SPAM_HEADER_ (requires map_storage)
Don't return an error, if check_header detects a spam, but set the header-field 'X-Spam-Header: [content]' for later processing (ie with procmail).

_CHECK_HEADER_SUBJECT_ (requires headers.c.patch)
Match the Subject: against the given regular expression. Usage: define(`_CHECK_HEADER_SUBJECT', `your regex here')dnl Should be used in conjunction with _X_SPAM_HEADER_ only.

_CHECK_HEADER_FROM_
Check the address given in the From:-line of the header against access.db This option is needed for the following, because it activates the ruleset check_from.

_CHECK_HEADER_REPLY_TO_
Check the address given in the Reply-To:-line of the header against access.db. Uses the ruleset check_from.

_CHECK_HEADER_SENDER_
Check the address given in the Sender:-line of the header against access.db. Uses the ruleset check_from.

_CHECK_HEADER_TO_
Check the address given in the To:-line of the header against access.db. Uses the ruleset check_from.

_CHECK_HEADER_CC_
Check the address given in the Cc:-line of the header against access.db. Uses the ruleset check_from.

_CHECK_HEADER_COMMENTS_
Check the address in 'Comments: Authenticated sender is <user@domain>' against the access.db.

_CHECK_HEADER_CONTENT_
Reject messages with Content-Type: text/html.

_CHECK_HEADER_MESSAGE_ID_
Check the Message-Id (rfc822) and lookup the @domain in (header_)access.db.

The parse_received pattern can parse the following formats (host = hostname or [ip]):

from host
from host (1.2.3.4)
from host (keyword host) (1.2.3.4)
from host ([1.2.3.4])
from host (user@host)
from host (hostname [1.2.3.4])
from host (hostname[1.2.3.4])
from host (keyword hostname [1.2.3.4])
from host (keyword host) (keyword hostname [1.2.3.4])
from host (keyword host) (keyword hostname[1.2.3.4])
from host (peer croschecked as hostname[1.2.3.4])
from host ([1.2.3.4] (may be forged))
(from user@localhost) is not matching

Ruleset hooks for your extensions

_CHECK_RCPT_HOOK_
If defined, the ruleset check_rcpt_hook or the name assigned, is called at the beginning of Local_check_rcpt. This can be usefull for extensions like relay after pop. $#error will reject the message. $#OK will also terminate the check_rcpt ruleset.

_CHECK_MAIL_HOOK_
If defined, Scheck_mail_hook or the name assigned, is called before Basic_check_mail. Returning <OK> or <RELAY> will stop Local_check_rcpt and set hc_switch OFF. $#error will reject the message or add a headerfield (with _X_SPAM_ENVELOPE_).

Known problems

If more than one recipient is specified, the friend storage can't work properly. If only one recipient wants to have the mail filtered, it is filtered for all. You can work around this problem using _X_SPAM_HEADER_.

Sometimes mail from mailinglists is rejected due to the checking of the header, because this is a kind of "relayed spam". Automatically maintained mailinglist servers will erroneously unsubscribe the receipient. Use _HC_SWITCH_ and whitelist safe mailinglist servers or senders in (relay_)access.db.

The parse_received pattern is not universal. If you find a non matching Received-line or if you have a more universal pattern, please send an email to Jan.Krueger+map@stud.uni-hannover.de.

In error messages, given in the dialog and in syslog, the recipient is given first, even if the mail is rejected due to the relay or sender. That cause funny dialogs like:

   MAIL FROM: <FREE@nowhere.mailer>
   250 <FREE@nowhere.mailer>... Sender ok
   RCPT TO: <Jan.Krueger@stud.uni-hannover.de>
   501 <Jan.Krueger@stud.uni-hannover.de>... Sender domain must exist

What is the headers.c.patch doing?

The patch changes in the file headers.c. You can define a check_ruleset for each header-field using the following syntax:
HFname: $>check_ruleset
HFname: $>+check_ruleset (encapsulate the value in "")
HFname: $>$|check_ruleset (normal and encapsulated value seperated by $|)

Scheck_ruleset
...
In the encapsulated value the comments in () are not discarded by prescan() in rscheck(). We need this for parsing the comments in Received:-lines. The patch only affects the header-checking part of the sendmail source.
Jan Krüger, Jan.Krueger+map@stud.uni-hannover.de