Mail Server Series — Part 6
SpamAssassin (SA) is the core engine behind modern email anti-spam filtering.
In this mail server architecture, we deploy SpamAssassin 4.0 + MySQL (Bayes + TxRep) + Amavis + Remote Learning (via Dovecot IMAPSieve) to achieve enterprise-grade spam detection accuracy.
This article covers:
- Why you should avoid local Bayes files
- Why SQL Bayes + TxRep is strongly recommended
- How to run daily rule updates (sa-update & sa-compile)
- How to implement fully automated spam/ham learning via IMAPSieve
- Full SpamAssassin Docker deployment
- MySQL schema adjustments required by SA 4.0
- Amavis integration (Inbound + Outbound)
This is a hands-on, production-ready guide based entirely on your running deployment.
1. Overview of the SpamAssassin Architecture
Your architecture:
Dovecot IMAPSieve
↓ move message
sa-remote-learn (spamc)
↓ TCP(783)
SpamAssassin Docker
↓ SQL
MariaDB (Bayes + TxRep)
Key design points:
✔ Postfix does not call SpamAssassin directly
✔ Processing is delegated to Amavis and remote learning
✔ All Bayes/TxRep data stored in MySQL
✔ Autolearning is handled by IMAPSieve, not SA itself
2. Why SQL Bayes + TxRep Instead of Local Files
Traditional setups store:
- Bayes in
/var/lib/spamassassin/ - TxRep or legacy AWL in Berkeley DB
Major drawbacks:
✘ Not persistent across containers
✘ Hard to share across users
✘ Backup is painful
✘ Lost when containers are rebuilt
Your architecture uses:
✔ SQL Bayes
- Centralized, persistent
- Shared among all users
- Survives container rebuilds
✔ SQL TxRep
Reputation scoring stored in MySQL.
Benefits:
✔ Better accuracy
✔ Cross-user intelligence
✔ Works perfectly in Docker environments
3. SpamAssassin Docker Deployment
Your Docker command:
docker run -d --name spamassassin \
--network intranet-net \
-e MYSQL_EXTRA_FILE=/run/secrets/sa_mysql.cnf \
-e MYSQL_DB=sa40 \
-e ALLOW_NETS='172.18.0.0/16' \
-e ALLOW_USER_RULES=1 \
-e SA_COMPILE_ON_START=1 \
...
Persistent volumes:
/etc/spamassassin → config
/var/lib/spamassassin → Bayes data (external volume)
/run/secrets → MySQL password
You also copied the default SpamAssassin rules directory into a persistent location.
This ensures:
✔ rule updates survive container rebuilds
✔ local.cf customizations do not get overwritten
4. local.cf – SQL Bayes + TxRep Configuration
Your final SA configuration:
use_bayes 1
bayes_auto_learn 0
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:database=sa40;host=maildb;port=3306
bayes_sql_username sa40
bayes_sql_password sa90452
loadplugin Mail::SpamAssassin::Plugin::TxRep
txrep_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:database=sa40;host=maildb;port=3306
user_awl_sql_username sa40
user_awl_sql_password sa90452
user_awl_sql_table txrep
Highlights:
✔ SQL Bayes enabled
✔ TxRep replaces legacy AWL
✔ Autolearning disabled (IMAPSieve performs learning)
5. Fully Automated Learning via IMAPSieve (Remote Learn)
You implemented the best possible learning method:
Move to Junk → Learn spam
Move back to Inbox → Learn ham
Sieve rule for SPAM:
require ["vnd.dovecot.pipe", "copy", "imapsieve","environment", "variables"];
if environment :matches "imap.user" "*" {
set "username" "${1}";
}
pipe :copy "sa-remote-learn-spam.sh" ["${username}"];
Shell script:
/usr/bin/spamc -d spamassassin -p 783 -u "$SA_USER" -L spam
Advantages:
✔ End users train the filter simply by moving emails
✔ Works from Roundcube, Outlook, mobile, IMAP clients
✔ Learning is fast and accurate
✔ No manual sa-learn required
This is the same architecture used by enterprise mail appliances.
6. Daily Rule Updates – sa-update & sa-compile
Your Amavis container handles daily updates using entrypoint.sh:
sa-update ${do_compile:+&& sa-compile} && pkill -HUP -f amavisd
Benefits:
✔ Always up-to-date rules
✔ Compiled rules improve performance 30–50%
✔ Amavis reload ensures immediate effect
7. Integrating SpamAssassin with Amavis
Two separate paths:
Inbound
Postfix → Amavis → SpamAssassin → clean/spam → Postfix → Dovecot
Outbound
User → Dovecot SASL → Postfix submission → Amavis (DKIM + spam scan) → Internet
Your SA scoring:
$sa_tag_level_deflt = 2.0;
$sa_tag2_level_deflt = 6.2;
$sa_kill_level_deflt = 6.9;
$final_spam_destiny = D_PASS;
D_PASS means:
✔ Spam is delivered
✔ Headers indicate SPAM
✔ Dovecot sorting handles final placement
This is ideal while tuning the system.
Later you can switch to:
- D_REJECT
- or D_DISCARD
8. MySQL Schema Fixes Required by SA 4.0
SA 4.0 requires new columns:
oldest_token_agenewest_token_age(NOT NULL)
Your container automatically patches the schema:
ALTER TABLE bayes_vars ADD COLUMN oldest_token_age INT(11) ...
ALTER TABLE bayes_vars MODIFY COLUMN newest_token_age INT(11) NOT NULL DEFAULT 0;
✔ Prevents SA 4.0 runtime errors
✔ Fully automated
9. Full SA Flow Diagram
Inbound mail:
Internet → Postfix → Amavis → SpamAssassin → Postfix → Dovecot → Mailbox
Outbound mail:
User → Dovecot → Postfix (submission) → Amavis → Internet
Learning:
User moves messages → Dovecot IMAPSieve → spamc remote learn → SpamAssassin → MariaDB (Bayes/TxRep)
10. Recommendations & Best Practices
✔ 1. Run periodic Bayes expiration
sa-learn --sync
✔ 2. Backup Bayes SQL
Backup tables:
bayes_token
bayes_seen
bayes_vars
txrep
✔ 3. Enable Razor/Pyzor + DNSBLs (optional)
This improves detection accuracy another 20–40%.
✔ 4. Install Roundcube MarkAsJunk2 plugin
Better UX for spam/ham actions.
Conclusion
This article covered the full integration of SpamAssassin 4.0 into your mail system:
- SQL-based Bayes and TxRep
- Automated spam/ham learning through IMAPSieve
- Daily rule updates and optimized compiled rules
- Full Docker deployment
- Automatic schema fixes
- Integration with Amavis and Dovecot
Your system now provides an enterprise-grade anti-spam capability that is fully modular, containerized, and future-proof.