Mail Server Series — Part 16

After completing the architecture, deployment, filtering pipeline, archiving system, full-text search, high availability, and operational procedures of the entire mail platform, this chapter introduces the final—but critical—piece:

How to build an enterprise-grade monitoring & alerting system for your self-hosted mail infrastructure.

The reliability of a mail system depends not only on its architecture, but also on:

Whether issues can be detected immediately
Whether the environment’s health can be quantified
Whether risks can be predicted (disk full, queue buildup, CPU overuse)
Whether you can avoid the classic problem:
“Users complain they can’t receive emails… only then you realize something is wrong.”

This article provides a complete DevOps-focused guide to building:

✔ Full-stack monitoring
✔ Real-time alerting
✔ Log aggregation & tracing
✔ Operational dashboards
✔ Deep observability for Docker-based mail systems

1. What Should You Monitor in a Mail Platform? (Complete Checklist)

A modern mail stack includes:

Postfix (SMTP)
Dovecot (IMAP/POP3/LMTP)
Amavis / ClamAV / SpamAssassin
MariaDB / Galera
Roundcube Webmail
Piler (archive system)
ManticoreSearch (full-text search)
Apache Reverse Proxy
Docker host, containers, network, storage

Monitoring should be divided into six major categories:

① Postfix (SMTP) Monitoring

Metric	Description
mail queue size	Queue spikes indicate blockage
defer / bounce rate	DNS issues, blacklists, or remote failures
SMTP delivery latency	Delays in outbound flow
inbound/outbound TPS	Load forecasting
reject rate	Spam attack or config error
TLS usage rate	Security posture

② Dovecot (IMAP/POP3/LMTP) Monitoring

Metric	Description
login success/fail count	Detect brute-force attacks
IMAP/LMTP connections	Detect exhaustion
I/O latency	Indicates disk bottlenecks
mailbox locking issues	Storage or FS issues
auth response time	LDAP / MariaDB problems

③ Amavis / ClamAV / SpamAssassin Monitoring

Metric	Description
ClamAV signature update status	Must stay fresh
spam hit rate	Sudden drop = SA malfunction
Amavis queue	Amavis blocking causes total mail freeze
CPU/RAM	SA may consume high CPU at peak

④ MariaDB / Galera Monitoring

Metric	Description
replication delay	Affects Roundcube & Dovecot auth
node health / flow-control	Stability of cluster
slow queries	Impacts all components
connection count	Detect leaks
DB size	Archive DB grows continuously

⑤ Piler + Manticore Monitoring

Metric	Description
search latency	User search experience
RT index delay	Whether indexes are up-to-date
piler queue backlog	Write operations stuck
archive store size	Long-term data accumulation
indexing errors	Schema/config inconsistencies

⑥ Host & Docker Monitoring

Metric	Description
CPU / RAM / Load	Prevent OOM kill
Disk I/O	Affects IMAP & indexing
Network latency	SMTP/IMAP/TLS issues
container health	Restart loops, unhealthy state
filesystem capacity	Disk full → mail system collapse

2. Recommended Full Monitoring Architecture

A robust monitoring stack should look like this:

┌───────────────────────────────┐
│        Grafana Dashboard       │  ← Visualization Layer
└───────────────┬───────────────┘
                │
        Prometheus Server
                │
┌───────────────┼────────────────────────────────────────┐
│               │                                        │
Exporter:   Postfix Exporter                 Node Exporter
            Dovecot Exporter                 Blackbox Exporter
            MariaDB Exporter                 Docker Exporter
            ClamAV Exporter                  Custom Piler/Manticore Exporter
└───────────────┴────────────────────────────────────────┘

3. Required Exporters (Recommended List)

3.1 Postfix Exporter

Monitors:

Queue size
Rejects/bounces
Delivery latency
TLS negotiation stats

Recommended:
knyar-style postfix exporter

3.2 Dovecot Exporter

Monitors:

Login fail rate
IMAP/LMTP connection count
Auth latency
Mailbox access patterns

3.3 ClamAV Exporter

Tracks:

signature update time
scan results
daemon uptime

3.4 MariaDB Exporter

Official exporter:

prom/mysqld_exporter

3.5 Node Exporter

Must-have for hardware monitoring.

3.6 Blackbox Exporter

Probe:

SMTP STARTTLS
SMTP AUTH
IMAP STARTTLS
HTTPS (webmail/piler)
Certificate expiration

3.7 Docker Exporter

Monitors:

restarted containers
unhealthy state
CPU/memory of containers

3.8 Custom Exporter for Piler & Manticore

Recommended metrics:

search latency
RT index lag
archive write delay
store usage growth
manticore query errors

4. Grafana Dashboards (Suggested Layout)

Dashboard A — Mail System Overview

inbound/outbound TPS
queue depth
SMTP TLS usage
login fail trends
DB latency
piler indexing delay
manticore query time

Perfect for management and daily monitoring.

Dashboard B — Postfix Deep Monitoring

per-minute SMTP throughput
reject count by rule
per-domain statistics
spam attack visualization
TLS handshake errors

Dashboard C — Dovecot Overview

login fail/success ratio
authentication latency
LMTP failures
I/O bottleneck
IMAP folder access heatmap

Dashboard D — Archive (Piler + Manticore)

indexing rate
search latency distribution
store size trends
RT index memory usage
fragmentation warning

Dashboard E — Host & Docker Monitoring

CPU / load
memory pressure
disk I/O
container health
network usage

5. Alerting Rules (Enterprise-Grade)

To prevent false alarms while keeping accuracy, here are recommended rules:

Postfix Alerts

Queue > 500 for over 10 minutes

Possible causes:

DNS outage
Amavis bottleneck
remote delivery failures

Dovecot Alerts

Login failure rate > 30%

Indicates brute-force attacks.

ClamAV Alerts

Signature older than 24 hours

MariaDB Alerts

Query latency > 200 ms

Affects:

SMTP authentication
Dovecot auth
Roundcube
Piler

Storage Alerts

Disk usage > 85%

Especially:

/var/vmail
/var/piler/store

Docker Alerts

container restart loops
“unhealthy” state
memory OOM kills

Manticore Alerts

search latency > 500 ms
index not updating
RT index overflow

6. External Probing (Blackbox Monitoring)

Very important for real production systems.

Probe the following:

smtp_starttls://mail.it.demo.tw:25
smtp_auth://mail.it.demo.tw:587
imap_starttls://mail.it.demo.tw:143
https://webmail.it.demo.tw
https://archive.it.demo.tw

You will immediately know if:

TLS handshake fails
cert is expired
mail service unreachable
reverse proxy broken

7. Centralized Alert Delivery

Recommended channels:

Microsoft Teams
Slack
Telegram Bot
Email (secondary only)

Alertmanager can integrate all of these easily.

8. Deployment Recommendations for Your Environment

Considering your environment:

Docker-based multi-container stack
postfix + dovecot + amavis
piler + manticore
MariaDB
Apache reverse proxy
strict firewall rules
DOCKER-USER custom chains

I recommend adding:

On Docker host

node_exporter
docker_exporter

Within the mail stack

postfix_exporter
dovecot_exporter
clamav_exporter
mysqld_exporter
blackbox_exporter

Central

prometheus
grafana
alertmanager

Conclusion — A Mail System Without Monitoring Is Not Production-Ready

Building the system is only the beginning.
True operational excellence comes from:

detecting issues early
getting instant alerts
seeing trends
identifying attacks
preventing downtime

With this chapter, your mail platform now has full production-grade observability.