Skip to content

Nuface Blog

隨意隨手記 Casual Notes

Menu
  • Home
  • About
  • Services
  • Blog
  • Contact
  • Privacy Policy
  • Login
Menu

High Availability, Scalability, and Long-Term Operations Guide

Posted on 2025-11-212025-11-21 by Rico

Mail Server Series — Part 15

Across the previous 14 articles, we built a fully modular, container-based enterprise mail system using:

  • Postfix (SMTP)
  • Dovecot (IMAP/POP3, LMTP)
  • Amavis + SpamAssassin + ClamAV (Filtering & DKIM)
  • Roundcube (Webmail)
  • Piler (Email Archiving)
  • ManticoreSearch (Full-text indexing with Chinese support)

This final chapter focuses on how to operate this platform in the long term — ensuring it is:

  • Highly Available (HA)
  • Scalable
  • Maintainable
  • Disaster-ready
  • Suitable for multi-site and multi-country deployment

This article is essentially your operations blueprint for running this system in production for years.


1. High-Level Overview of Redundancy & Scalability

A mail platform involves many components:

  • SMTP inbound/outbound delivery
  • IMAP mailbox access
  • Spam/Virus scanning
  • DKIM/DMARC/SPF validation
  • Archiving and full-text search
  • Webmail
  • Management tools

Each component requires its own HA strategy.
Containers make this easier, but architecture design remains crucial.


2. High Availability for Postfix & Dovecot

2.1 Cluster Architecture

        [VIP / Load Balancer / DNS Round Robin]
                      |
     ┌────────────────┴────────────────┐
 [Mail Node 1]                    [Mail Node 2]
   Postfix                           Postfix
   Dovecot                           Dovecot
   Amavis                            Amavis
   SpamAssassin                      SpamAssassin
   ClamAV                            ClamAV

2.2 HA Strategies

✔ 1. Multiple MX Records

MX 10 mail1.it.demo.tw
MX 20 mail2.it.demo.tw

If mail1 fails, mail2 automatically takes over.


✔ 2. Shared Mail Storage

Dovecot requires consistent mail storage:

/var/vmail/

Possible solutions:

  • NFSv4 (stable + fsync)
  • CephFS (excellent for scaling)
  • GlusterFS
  • ZFS replication (recommended for simplicity)

Best option for mail servers:
Dovecot Native Replication
https://doc.dovecot.org/replication/


✔ 3. Postfix Queues Do NOT Need Synchronization

Postfix queues can remain independent:

  • SMTP senders retry automatically
  • Multi-MX architecture handles failover naturally

✔ 4. Amavis, SpamAssassin, ClamAV Are Stateless

These services can run on each node without synchronization.


3. MariaDB Redundancy Strategies

MariaDB stores:

  • PostfixAdmin (domains/users/aliases)
  • SpamAssassin Bayes databases
  • Piler metadata

You can choose one of the following HA options:


3.1 Galera Cluster (Best for Production)

Fully synchronous replication with 3 nodes:

DB1 ─ DB2 ─ DB3

Pros:

  • Any node can read/write
  • Instant failover
  • Perfect for mail systems

3.2 Master → Slave Replication

Simpler:

Master → Slave

Promote manually during failover.


3.3 Backup-Only Mode (Simplest)

Daily mysqldump:

postfix
sa40
piler

Not true HA, but excellent disaster recovery capability.


4. HA for Piler + ManticoreSearch

Email archiving requires special handling.

Components:

  • /var/piler/store — Raw archived email files
  • MariaDB — Archive metadata
  • ManticoreSearch — Full-text index

4.1 Storage Redundancy

Options:

  • ZFS snapshots & replication
  • Rsync incremental sync
  • CephFS / GlusterFS (large enterprises)

4.2 ManticoreSearch Replication

Manticore supports native synchronous clusters:

Manticore 1 ⇄ Manticore 2

If one goes down, search continues seamlessly.


5. HA for Roundcube, PostfixAdmin, Piler Web UI

These are stateless web services → ideal for load balancing.

Options:

  • Nginx
  • HAProxy
  • Apache mod_proxy_balancer
  • Cloudflare Load Balancing

For session handling:

  • SQLite is acceptable
  • Redis/Memcached is better for large deployments

6. Multi-Site / Multi-Country Deployment

For companies with sites:

  • Taiwan
  • Singapore
  • Vietnam
  • Malaysia
  • Thailand

You may adopt several patterns.


Option A: Centralized Global Mail Platform (recommended)

  • All MX records point to headquarters
  • Single archive and search system
  • Easiest to maintain
  • Strongest compliance governance

Option B: Distributed Mail Nodes (large enterprises)

  • Each country has its own SMTP/IMAP node
  • Central Piler for unified archiving
  • Localized delivery → lower latency

Option C: Hybrid

  • HQ as primary mail system
  • One additional node in Asia as backup MX
  • Provides strong failover across unstable regions

7. Monitoring Strategy (Mandatory for Production)

Use:

  • Prometheus + Grafana
  • Or Zabbix

Watch these KPIs:

SMTP

  • Queue depth
  • Reject ratio
  • Delivery latency

IMAP

  • Dovecot process health
  • Authentication failures

Spam/Virus

  • ClamAV update status
  • SpamAssassin hit ratio

Piler & Manticore

  • Index delay
  • Storage growth trends
  • Query failures

8. Operational Automation

8.1 Automated Let’s Encrypt Renewal

Certbot inside Apache reverse proxy.


8.2 SpamAssassin Auto-Update

Cron job:

sa-update && sa-compile && systemctl reload amavis

8.3 Automated DB Backups

Nightly Cron:

mysqldump postfix
mysqldump piler
mysqldump sa40

8.4 Piler Store Synchronization

rsync -a /var/piler/store remote:/backup/piler/

9. Troubleshooting SOP

9.1 Dovecot Login Fails

  • Check logs
  • Check firewall
  • Verify SQL credentials
  • Check disk space

9.2 Postfix Can’t Receive Mail

Check:

  • MX records
  • Firewall rules
  • Amavis (10024/10025)
  • SQL domain/user settings

9.3 Gmail / Outlook Reject Outbound Mail

Verify:

  • DKIM
  • SPF
  • DMARC
  • Reverse DNS
  • IP reputation (Spamhaus, Barracuda, etc.)

9.4 Piler Cannot Search Chinese

Verify:

  • morphology='icu_chinese'
  • ngram_len=2
  • Correct Manticore schema

10. Disaster Recovery Guide

If the entire server is lost, you only need:

/opt/docker/mail/*             ← All config
/var/vmail                     ← All mailboxes
/var/piler/store               ← Archived emails
MariaDB SQL dumps              ← postfix + piler + sa40

Recovery steps:

  1. Install Docker + Compose on a new machine
  2. Restore directories
  3. Restore DB dumps
  4. docker compose up -d

Your entire mail system is restored.


Conclusion — From “Mail Server” to “Enterprise Mail Platform”

After 14 technical chapters and this final HA/Operations guide, your system now has:

  • Full enterprise-level functionality
  • High-availability capabilities
  • Multi-site scaling potential
  • Complete filtering and DKIM/DMARC compliance
  • Secure and searchable archiving system
  • Production-grade monitoring and DR strategy

This is no longer just a “self-hosted mail server.”
It is a complete enterprise communications platform, fully tailored to your organization.

Recent Posts

  • Postfix + Let’s Encrypt + BIND9 + DANE Fully Automated TLSA Update Guide
  • Postfix + Let’s Encrypt + BIND9 + DANE TLSA 指紋自動更新完整教學
  • Deploying DANE in Postfix
  • 如何在 Postfix 中部署 DANE
  • DANE: DNSSEC-Based TLS Protection

Recent Comments

  1. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on High Availability Architecture, Failover, GeoDNS, Monitoring, and Email Abuse Automation (SOAR)
  2. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on MariaDB + PostfixAdmin: The Core of Virtual Domain & Mailbox Management
  3. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Daily Operations, Monitoring, and Performance Tuning for an Enterprise Mail System
  4. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Final Chapter: Complete Troubleshooting Guide & Frequently Asked Questions (FAQ)
  5. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Network Architecture, DNS Configuration, TLS Design, and Postfix/Dovecot SNI Explained

Archives

  • December 2025
  • November 2025
  • October 2025

Categories

  • AI
  • Apache
  • Cybersecurity
  • Database
  • DNS
  • Docker
  • Fail2Ban
  • FileSystem
  • Firewall
  • Linux
  • LLM
  • Mail
  • N8N
  • OpenLdap
  • OPNsense
  • PHP
  • QoS
  • Samba
  • Switch
  • Virtualization
  • VPN
  • WordPress
© 2025 Nuface Blog | Powered by Superbs Personal Blog theme