Mail Server Series — Part 6

SpamAssassin (SA) is the core engine behind modern email anti-spam filtering.
In this mail server architecture, we deploy SpamAssassin 4.0 + MySQL (Bayes + TxRep) + Amavis + Remote Learning (via Dovecot IMAPSieve) to achieve enterprise-grade spam detection accuracy.

This article covers:

Why you should avoid local Bayes files
Why SQL Bayes + TxRep is strongly recommended
How to run daily rule updates (sa-update & sa-compile)
How to implement fully automated spam/ham learning via IMAPSieve
Full SpamAssassin Docker deployment
MySQL schema adjustments required by SA 4.0
Amavis integration (Inbound + Outbound)

This is a hands-on, production-ready guide based entirely on your running deployment.

1. Overview of the SpamAssassin Architecture

Your architecture:

Dovecot IMAPSieve
     ↓ move message
sa-remote-learn (spamc)
     ↓ TCP(783)
SpamAssassin Docker
     ↓ SQL
MariaDB (Bayes + TxRep)

Key design points:

✔ Postfix does not call SpamAssassin directly
✔ Processing is delegated to Amavis and remote learning
✔ All Bayes/TxRep data stored in MySQL
✔ Autolearning is handled by IMAPSieve, not SA itself

2. Why SQL Bayes + TxRep Instead of Local Files

Traditional setups store:

Bayes in /var/lib/spamassassin/
TxRep or legacy AWL in Berkeley DB

Major drawbacks:

✘ Not persistent across containers
✘ Hard to share across users
✘ Backup is painful
✘ Lost when containers are rebuilt

Your architecture uses:

✔ SQL Bayes

Centralized, persistent
Shared among all users
Survives container rebuilds

✔ SQL TxRep

Reputation scoring stored in MySQL.

Benefits:

✔ Better accuracy
✔ Cross-user intelligence
✔ Works perfectly in Docker environments

3. SpamAssassin Docker Deployment

Your Docker command:

docker run -d --name spamassassin \
  --network intranet-net \
  -e MYSQL_EXTRA_FILE=/run/secrets/sa_mysql.cnf \
  -e MYSQL_DB=sa40 \
  -e ALLOW_NETS='172.18.0.0/16' \
  -e ALLOW_USER_RULES=1 \
  -e SA_COMPILE_ON_START=1 \
  ...

Persistent volumes:

/etc/spamassassin        → config
/var/lib/spamassassin    → Bayes data (external volume)
/run/secrets             → MySQL password

You also copied the default SpamAssassin rules directory into a persistent location.
This ensures:

✔ rule updates survive container rebuilds
✔ local.cf customizations do not get overwritten

4. local.cf – SQL Bayes + TxRep Configuration

Your final SA configuration:

use_bayes 1
bayes_auto_learn 0
bayes_store_module          Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn               DBI:mysql:database=sa40;host=maildb;port=3306
bayes_sql_username          sa40
bayes_sql_password          sa90452

loadplugin Mail::SpamAssassin::Plugin::TxRep
txrep_factory               Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                DBI:mysql:database=sa40;host=maildb;port=3306
user_awl_sql_username       sa40
user_awl_sql_password       sa90452
user_awl_sql_table          txrep

Highlights:

✔ SQL Bayes enabled
✔ TxRep replaces legacy AWL
✔ Autolearning disabled (IMAPSieve performs learning)

5. Fully Automated Learning via IMAPSieve (Remote Learn)

You implemented the best possible learning method:

Move to Junk → Learn spam

Move back to Inbox → Learn ham

Sieve rule for SPAM:

require ["vnd.dovecot.pipe", "copy", "imapsieve","environment", "variables"];

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-remote-learn-spam.sh" ["${username}"];

Shell script:

/usr/bin/spamc -d spamassassin -p 783 -u "$SA_USER" -L spam

Advantages:

✔ End users train the filter simply by moving emails
✔ Works from Roundcube, Outlook, mobile, IMAP clients
✔ Learning is fast and accurate
✔ No manual sa-learn required

This is the same architecture used by enterprise mail appliances.

6. Daily Rule Updates – sa-update & sa-compile

Your Amavis container handles daily updates using entrypoint.sh:

sa-update ${do_compile:+&& sa-compile} && pkill -HUP -f amavisd

Benefits:

✔ Always up-to-date rules
✔ Compiled rules improve performance 30–50%
✔ Amavis reload ensures immediate effect

7. Integrating SpamAssassin with Amavis

Two separate paths:

Inbound

Postfix → Amavis → SpamAssassin → clean/spam → Postfix → Dovecot

Outbound

User → Dovecot SASL → Postfix submission → Amavis (DKIM + spam scan) → Internet

Your SA scoring:

$sa_tag_level_deflt  = 2.0;
$sa_tag2_level_deflt = 6.2;
$sa_kill_level_deflt = 6.9;
$final_spam_destiny  = D_PASS;

D_PASS means:

✔ Spam is delivered
✔ Headers indicate SPAM
✔ Dovecot sorting handles final placement

This is ideal while tuning the system.
Later you can switch to:

D_REJECT
or D_DISCARD

8. MySQL Schema Fixes Required by SA 4.0

SA 4.0 requires new columns:

oldest_token_age
newest_token_age (NOT NULL)

Your container automatically patches the schema:

ALTER TABLE bayes_vars ADD COLUMN oldest_token_age INT(11) ...
ALTER TABLE bayes_vars MODIFY COLUMN newest_token_age INT(11) NOT NULL DEFAULT 0;

✔ Prevents SA 4.0 runtime errors
✔ Fully automated

9. Full SA Flow Diagram

Inbound mail:
Internet → Postfix → Amavis → SpamAssassin → Postfix → Dovecot → Mailbox

Outbound mail:
User → Dovecot → Postfix (submission) → Amavis → Internet

Learning:
User moves messages → Dovecot IMAPSieve → spamc remote learn → SpamAssassin → MariaDB (Bayes/TxRep)

10. Recommendations & Best Practices

✔ 1. Run periodic Bayes expiration

sa-learn --sync

✔ 2. Backup Bayes SQL

Backup tables:

bayes_token  
bayes_seen  
bayes_vars  
txrep

✔ 3. Enable Razor/Pyzor + DNSBLs (optional)

This improves detection accuracy another 20–40%.

✔ 4. Install Roundcube MarkAsJunk2 plugin

Better UX for spam/ham actions.

Conclusion

This article covered the full integration of SpamAssassin 4.0 into your mail system:

SQL-based Bayes and TxRep
Automated spam/ham learning through IMAPSieve
Daily rule updates and optimized compiled rules
Full Docker deployment
Automatic schema fixes
Integration with Amavis and Dovecot

Your system now provides an enterprise-grade anti-spam capability that is fully modular, containerized, and future-proof.