Skip to content

Nuface Blog

隨意隨手記 Casual Notes

Menu
  • Home
  • About
  • Services
  • Blog
  • Contact
  • Privacy Policy
  • Login
Menu

SpamAssassin 4.0: SQL Bayes, TxRep, sa-update, Remote Learning & Full Docker Deployment Guide

Posted on 2025-11-202025-11-21 by Rico

Mail Server Series — Part 6

SpamAssassin (SA) is the core engine behind modern email anti-spam filtering.
In this mail server architecture, we deploy SpamAssassin 4.0 + MySQL (Bayes + TxRep) + Amavis + Remote Learning (via Dovecot IMAPSieve) to achieve enterprise-grade spam detection accuracy.

This article covers:

  • Why you should avoid local Bayes files
  • Why SQL Bayes + TxRep is strongly recommended
  • How to run daily rule updates (sa-update & sa-compile)
  • How to implement fully automated spam/ham learning via IMAPSieve
  • Full SpamAssassin Docker deployment
  • MySQL schema adjustments required by SA 4.0
  • Amavis integration (Inbound + Outbound)

This is a hands-on, production-ready guide based entirely on your running deployment.


1. Overview of the SpamAssassin Architecture

Your architecture:

Dovecot IMAPSieve
     ↓ move message
sa-remote-learn (spamc)
     ↓ TCP(783)
SpamAssassin Docker
     ↓ SQL
MariaDB (Bayes + TxRep)

Key design points:

✔ Postfix does not call SpamAssassin directly
✔ Processing is delegated to Amavis and remote learning
✔ All Bayes/TxRep data stored in MySQL
✔ Autolearning is handled by IMAPSieve, not SA itself


2. Why SQL Bayes + TxRep Instead of Local Files

Traditional setups store:

  • Bayes in /var/lib/spamassassin/
  • TxRep or legacy AWL in Berkeley DB

Major drawbacks:

✘ Not persistent across containers
✘ Hard to share across users
✘ Backup is painful
✘ Lost when containers are rebuilt

Your architecture uses:

✔ SQL Bayes

  • Centralized, persistent
  • Shared among all users
  • Survives container rebuilds

✔ SQL TxRep

Reputation scoring stored in MySQL.

Benefits:

✔ Better accuracy
✔ Cross-user intelligence
✔ Works perfectly in Docker environments


3. SpamAssassin Docker Deployment

Your Docker command:

docker run -d --name spamassassin \
  --network intranet-net \
  -e MYSQL_EXTRA_FILE=/run/secrets/sa_mysql.cnf \
  -e MYSQL_DB=sa40 \
  -e ALLOW_NETS='172.18.0.0/16' \
  -e ALLOW_USER_RULES=1 \
  -e SA_COMPILE_ON_START=1 \
  ...

Persistent volumes:

/etc/spamassassin        → config
/var/lib/spamassassin    → Bayes data (external volume)
/run/secrets             → MySQL password

You also copied the default SpamAssassin rules directory into a persistent location.
This ensures:

✔ rule updates survive container rebuilds
✔ local.cf customizations do not get overwritten


4. local.cf – SQL Bayes + TxRep Configuration

Your final SA configuration:

use_bayes 1
bayes_auto_learn 0
bayes_store_module          Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn               DBI:mysql:database=sa40;host=maildb;port=3306
bayes_sql_username          sa40
bayes_sql_password          sa90452

loadplugin Mail::SpamAssassin::Plugin::TxRep
txrep_factory               Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                DBI:mysql:database=sa40;host=maildb;port=3306
user_awl_sql_username       sa40
user_awl_sql_password       sa90452
user_awl_sql_table          txrep

Highlights:

✔ SQL Bayes enabled
✔ TxRep replaces legacy AWL
✔ Autolearning disabled (IMAPSieve performs learning)


5. Fully Automated Learning via IMAPSieve (Remote Learn)

You implemented the best possible learning method:

Move to Junk → Learn spam

Move back to Inbox → Learn ham

Sieve rule for SPAM:

require ["vnd.dovecot.pipe", "copy", "imapsieve","environment", "variables"];

if environment :matches "imap.user" "*" {
  set "username" "${1}";
}

pipe :copy "sa-remote-learn-spam.sh" ["${username}"];

Shell script:

/usr/bin/spamc -d spamassassin -p 783 -u "$SA_USER" -L spam

Advantages:

✔ End users train the filter simply by moving emails
✔ Works from Roundcube, Outlook, mobile, IMAP clients
✔ Learning is fast and accurate
✔ No manual sa-learn required

This is the same architecture used by enterprise mail appliances.


6. Daily Rule Updates – sa-update & sa-compile

Your Amavis container handles daily updates using entrypoint.sh:

sa-update ${do_compile:+&& sa-compile} && pkill -HUP -f amavisd

Benefits:

✔ Always up-to-date rules
✔ Compiled rules improve performance 30–50%
✔ Amavis reload ensures immediate effect


7. Integrating SpamAssassin with Amavis

Two separate paths:

Inbound

Postfix → Amavis → SpamAssassin → clean/spam → Postfix → Dovecot

Outbound

User → Dovecot SASL → Postfix submission → Amavis (DKIM + spam scan) → Internet

Your SA scoring:

$sa_tag_level_deflt  = 2.0;
$sa_tag2_level_deflt = 6.2;
$sa_kill_level_deflt = 6.9;
$final_spam_destiny  = D_PASS;

D_PASS means:

✔ Spam is delivered
✔ Headers indicate SPAM
✔ Dovecot sorting handles final placement

This is ideal while tuning the system.
Later you can switch to:

  • D_REJECT
  • or D_DISCARD

8. MySQL Schema Fixes Required by SA 4.0

SA 4.0 requires new columns:

  • oldest_token_age
  • newest_token_age (NOT NULL)

Your container automatically patches the schema:

ALTER TABLE bayes_vars ADD COLUMN oldest_token_age INT(11) ...
ALTER TABLE bayes_vars MODIFY COLUMN newest_token_age INT(11) NOT NULL DEFAULT 0;

✔ Prevents SA 4.0 runtime errors
✔ Fully automated


9. Full SA Flow Diagram

Inbound mail:
Internet → Postfix → Amavis → SpamAssassin → Postfix → Dovecot → Mailbox

Outbound mail:
User → Dovecot → Postfix (submission) → Amavis → Internet

Learning:
User moves messages → Dovecot IMAPSieve → spamc remote learn → SpamAssassin → MariaDB (Bayes/TxRep)

10. Recommendations & Best Practices

✔ 1. Run periodic Bayes expiration

sa-learn --sync

✔ 2. Backup Bayes SQL

Backup tables:

bayes_token  
bayes_seen  
bayes_vars  
txrep

✔ 3. Enable Razor/Pyzor + DNSBLs (optional)

This improves detection accuracy another 20–40%.

✔ 4. Install Roundcube MarkAsJunk2 plugin

Better UX for spam/ham actions.


Conclusion

This article covered the full integration of SpamAssassin 4.0 into your mail system:

  • SQL-based Bayes and TxRep
  • Automated spam/ham learning through IMAPSieve
  • Daily rule updates and optimized compiled rules
  • Full Docker deployment
  • Automatic schema fixes
  • Integration with Amavis and Dovecot

Your system now provides an enterprise-grade anti-spam capability that is fully modular, containerized, and future-proof.

Recent Posts

  • Postfix + Let’s Encrypt + BIND9 + DANE Fully Automated TLSA Update Guide
  • Postfix + Let’s Encrypt + BIND9 + DANE TLSA 指紋自動更新完整教學
  • Deploying DANE in Postfix
  • 如何在 Postfix 中部署 DANE
  • DANE: DNSSEC-Based TLS Protection

Recent Comments

  1. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on High Availability Architecture, Failover, GeoDNS, Monitoring, and Email Abuse Automation (SOAR)
  2. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on MariaDB + PostfixAdmin: The Core of Virtual Domain & Mailbox Management
  3. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Daily Operations, Monitoring, and Performance Tuning for an Enterprise Mail System
  4. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Final Chapter: Complete Troubleshooting Guide & Frequently Asked Questions (FAQ)
  5. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Network Architecture, DNS Configuration, TLS Design, and Postfix/Dovecot SNI Explained

Archives

  • December 2025
  • November 2025
  • October 2025

Categories

  • AI
  • Apache
  • Cybersecurity
  • Database
  • DNS
  • Docker
  • Fail2Ban
  • FileSystem
  • Firewall
  • Linux
  • LLM
  • Mail
  • N8N
  • OpenLdap
  • OPNsense
  • PHP
  • QoS
  • Samba
  • Switch
  • Virtualization
  • VPN
  • WordPress
© 2025 Nuface Blog | Powered by Superbs Personal Blog theme