Mail Server Series — Part 16
After completing the architecture, deployment, filtering pipeline, archiving system, full-text search, high availability, and operational procedures of the entire mail platform, this chapter introduces the final—but critical—piece:
How to build an enterprise-grade monitoring & alerting system for your self-hosted mail infrastructure.
The reliability of a mail system depends not only on its architecture, but also on:
- Whether issues can be detected immediately
- Whether the environment’s health can be quantified
- Whether risks can be predicted (disk full, queue buildup, CPU overuse)
- Whether you can avoid the classic problem:
“Users complain they can’t receive emails… only then you realize something is wrong.”
This article provides a complete DevOps-focused guide to building:
✔ Full-stack monitoring
✔ Real-time alerting
✔ Log aggregation & tracing
✔ Operational dashboards
✔ Deep observability for Docker-based mail systems
1. What Should You Monitor in a Mail Platform? (Complete Checklist)
A modern mail stack includes:
- Postfix (SMTP)
- Dovecot (IMAP/POP3/LMTP)
- Amavis / ClamAV / SpamAssassin
- MariaDB / Galera
- Roundcube Webmail
- Piler (archive system)
- ManticoreSearch (full-text search)
- Apache Reverse Proxy
- Docker host, containers, network, storage
Monitoring should be divided into six major categories:
① Postfix (SMTP) Monitoring
| Metric | Description |
|---|---|
| mail queue size | Queue spikes indicate blockage |
| defer / bounce rate | DNS issues, blacklists, or remote failures |
| SMTP delivery latency | Delays in outbound flow |
| inbound/outbound TPS | Load forecasting |
| reject rate | Spam attack or config error |
| TLS usage rate | Security posture |
② Dovecot (IMAP/POP3/LMTP) Monitoring
| Metric | Description |
|---|---|
| login success/fail count | Detect brute-force attacks |
| IMAP/LMTP connections | Detect exhaustion |
| I/O latency | Indicates disk bottlenecks |
| mailbox locking issues | Storage or FS issues |
| auth response time | LDAP / MariaDB problems |
③ Amavis / ClamAV / SpamAssassin Monitoring
| Metric | Description |
|---|---|
| ClamAV signature update status | Must stay fresh |
| spam hit rate | Sudden drop = SA malfunction |
| Amavis queue | Amavis blocking causes total mail freeze |
| CPU/RAM | SA may consume high CPU at peak |
④ MariaDB / Galera Monitoring
| Metric | Description |
|---|---|
| replication delay | Affects Roundcube & Dovecot auth |
| node health / flow-control | Stability of cluster |
| slow queries | Impacts all components |
| connection count | Detect leaks |
| DB size | Archive DB grows continuously |
⑤ Piler + Manticore Monitoring
| Metric | Description |
|---|---|
| search latency | User search experience |
| RT index delay | Whether indexes are up-to-date |
| piler queue backlog | Write operations stuck |
| archive store size | Long-term data accumulation |
| indexing errors | Schema/config inconsistencies |
⑥ Host & Docker Monitoring
| Metric | Description |
|---|---|
| CPU / RAM / Load | Prevent OOM kill |
| Disk I/O | Affects IMAP & indexing |
| Network latency | SMTP/IMAP/TLS issues |
| container health | Restart loops, unhealthy state |
| filesystem capacity | Disk full → mail system collapse |
2. Recommended Full Monitoring Architecture
A robust monitoring stack should look like this:
┌───────────────────────────────┐
│ Grafana Dashboard │ ← Visualization Layer
└───────────────┬───────────────┘
│
Prometheus Server
│
┌───────────────┼────────────────────────────────────────┐
│ │ │
Exporter: Postfix Exporter Node Exporter
Dovecot Exporter Blackbox Exporter
MariaDB Exporter Docker Exporter
ClamAV Exporter Custom Piler/Manticore Exporter
└───────────────┴────────────────────────────────────────┘
3. Required Exporters (Recommended List)
3.1 Postfix Exporter
Monitors:
- Queue size
- Rejects/bounces
- Delivery latency
- TLS negotiation stats
Recommended:
knyar-style postfix exporter
3.2 Dovecot Exporter
Monitors:
- Login fail rate
- IMAP/LMTP connection count
- Auth latency
- Mailbox access patterns
3.3 ClamAV Exporter
Tracks:
- signature update time
- scan results
- daemon uptime
3.4 MariaDB Exporter
Official exporter:
prom/mysqld_exporter
3.5 Node Exporter
Must-have for hardware monitoring.
3.6 Blackbox Exporter
Probe:
- SMTP STARTTLS
- SMTP AUTH
- IMAP STARTTLS
- HTTPS (webmail/piler)
- Certificate expiration
3.7 Docker Exporter
Monitors:
- restarted containers
- unhealthy state
- CPU/memory of containers
3.8 Custom Exporter for Piler & Manticore
Recommended metrics:
- search latency
- RT index lag
- archive write delay
- store usage growth
- manticore query errors
(If you need, I can write a custom exporter for your environment.)
4. Grafana Dashboards (Suggested Layout)
Dashboard A — Mail System Overview
- inbound/outbound TPS
- queue depth
- SMTP TLS usage
- login fail trends
- DB latency
- piler indexing delay
- manticore query time
Perfect for management and daily monitoring.
Dashboard B — Postfix Deep Monitoring
- per-minute SMTP throughput
- reject count by rule
- per-domain statistics
- spam attack visualization
- TLS handshake errors
Dashboard C — Dovecot Overview
- login fail/success ratio
- authentication latency
- LMTP failures
- I/O bottleneck
- IMAP folder access heatmap
Dashboard D — Archive (Piler + Manticore)
- indexing rate
- search latency distribution
- store size trends
- RT index memory usage
- fragmentation warning
Dashboard E — Host & Docker Monitoring
- CPU / load
- memory pressure
- disk I/O
- container health
- network usage
5. Alerting Rules (Enterprise-Grade)
To prevent false alarms while keeping accuracy, here are recommended rules:
Postfix Alerts
Queue > 500 for over 10 minutes
Possible causes:
- DNS outage
- Amavis bottleneck
- remote delivery failures
Dovecot Alerts
Login failure rate > 30%
Indicates brute-force attacks.
ClamAV Alerts
Signature older than 24 hours
MariaDB Alerts
Query latency > 200 ms
Affects:
- SMTP authentication
- Dovecot auth
- Roundcube
- Piler
Storage Alerts
Disk usage > 85%
Especially:
/var/vmail
/var/piler/store
Docker Alerts
- container restart loops
- “unhealthy” state
- memory OOM kills
Manticore Alerts
- search latency > 500 ms
- index not updating
- RT index overflow
6. External Probing (Blackbox Monitoring)
Very important for real production systems.
Probe the following:
smtp_starttls://mail.it.demo.tw:25
smtp_auth://mail.it.demo.tw:587
imap_starttls://mail.it.demo.tw:143
https://webmail.it.demo.tw
https://archive.it.demo.tw
You will immediately know if:
- TLS handshake fails
- cert is expired
- mail service unreachable
- reverse proxy broken
7. Centralized Alert Delivery
Recommended channels:
- Microsoft Teams
- Slack
- Telegram Bot
- Email (secondary only)
Alertmanager can integrate all of these easily.
8. Deployment Recommendations for Your Environment
Considering your environment:
- Docker-based multi-container stack
- postfix + dovecot + amavis
- piler + manticore
- MariaDB
- Apache reverse proxy
- strict firewall rules
- DOCKER-USER custom chains
I recommend adding:
On Docker host
- node_exporter
- docker_exporter
Within the mail stack
- postfix_exporter
- dovecot_exporter
- clamav_exporter
- mysqld_exporter
- blackbox_exporter
Central
- prometheus
- grafana
- alertmanager
Conclusion — A Mail System Without Monitoring Is Not Production-Ready
Building the system is only the beginning.
True operational excellence comes from:
- detecting issues early
- getting instant alerts
- seeing trends
- identifying attacks
- preventing downtime
With this chapter, your mail platform now has full production-grade observability.