Skip to content

Nuface Blog

้šจๆ„้šจๆ‰‹่จ˜ Casual Notes

Menu
  • Home
  • About
  • Services
  • Blog
  • Contact
  • Privacy Policy
  • Login
Menu

Ceph Cluster High Availability and Multi-Site Replication Strategies

Posted on 2025-11-012025-11-01 by Rico

๐Ÿ”ฐ Introduction

In modern enterprise infrastructure, achieving high availability (HA) and multi-site disaster recovery (DR) for storage systems is a critical requirement.

With its distributed design and self-healing replication model, Ceph provides built-in fault tolerance,
automatic recovery, and the ability to replicate data across multiple data centers โ€” all without service interruption.

This article explains:
1๏ธโƒฃ Cephโ€™s native high-availability mechanisms
2๏ธโƒฃ Replication vs. Erasure Coding strategies
3๏ธโƒฃ Multi-site replication and mirroring design
4๏ธโƒฃ Practical HA + DR implementation in Proxmox clusters


๐Ÿงฉ 1. Ceph High Availability Architecture

1๏ธโƒฃ Distributed Consistency with CRUSH

Ceph uses the CRUSH (Controlled Replication Under Scalable Hashing) algorithm to distribute objects across many OSDs (Object Storage Daemons)
while maintaining data redundancy and placement consistency.

Client
  โ”‚
  โ””โ”€โ”€> CRUSH Map โ†’ Distributes data to OSD1 / OSD2 / OSD3

Because metadata is distributed and decentralized,
even if one node goes offline, Ceph automatically rebuilds lost replicas without disrupting operations.


2๏ธโƒฃ Key HA Components

ComponentRole
MON (Monitor)Maintains cluster maps and quorum; at least 3 nodes recommended.
OSD (Object Storage Daemon)Manages physical disks and handles data replication.
MGR (Manager)Provides cluster metrics, dashboards, and Prometheus integration.
CephFS / RBD ClientsAutomatically re-route I/O when OSDs or nodes fail.

โœ… Cephโ€™s HA capabilities are natively integrated โ€” no external load balancers or clustering tools are required.


โš™๏ธ 2. Data Redundancy and Fault Tolerance

1๏ธโƒฃ Replication

The most common fault-tolerance method in Ceph.
Each object is written to multiple OSDs, ensuring availability even if one disk fails.

ModeFault ToleranceStorage Efficiency
3 ReplicasSurvive 1 OSD failure33 %
2 ReplicasSurvive 1 OSD failure (riskier)50 %

๐Ÿ’ก A 3-replica model is recommended for production clusters to balance reliability and recovery time.


2๏ธโƒฃ Erasure Coding (EC)

Erasure Coding splits data into multiple fragments plus parity blocks,
allowing data reconstruction while using less storage capacity.

Example: EC 4 + 2
โ†’ 4 data fragments + 2 parity fragments
โ†’ tolerates any 2 OSD failures
โ†’ storage efficiency โ‰ˆ 66 %

ModeAdvantagesTrade-offs
Erasure Coding (EC)Efficient, space-savingHigher latency, limited snapshot support

EC is ideal for backup and cold-data storage,
while replication remains best for VMs, databases, and real-time workloads.


โ˜๏ธ 3. Multi-Site Replication and Disaster Recovery

1๏ธโƒฃ RBD Mirror (Block-Level Replication)

Ceph natively supports RBD mirroring, allowing asynchronous block-level replication between two clusters.

Cluster A (Primary)
     โ”‚
     โ”‚  RBD Mirror (Async)
     โ–ผ
Cluster B (Secondary)

Key Features

  • Supports one-way or bidirectional replication
  • Snapshot-based and incremental sync
  • Manual or automatic failover

Perfect for Proxmox VM disk replication across data centers.


2๏ธโƒฃ CephFS Mirror (File-Level Replication)

Since Ceph Pacific (16.x), CephFS supports snapshot-based directory replication between clusters.

ceph fs mirror enable cephfs
ceph fs snapshot mirror add remote-site <remote-cluster>

Use cases:

  • PBS (Proxmox Backup Server) data directories
  • AI / ML training datasets
  • Departmental file repositories

3๏ธโƒฃ RGW Multi-Site (Object-Level Replication)

For S3-compatible object storage, Ceph RGW provides multi-zone and multi-region replication.

ModeDescription
Multi-ZoneMultiple RGW instances within one cluster share data.
Multi-RegionCross-cluster replication (active-active or active-passive).
Region A  โ†โ†’  Region B
RGW Zone A โ†โ†’ RGW Zone B

RGW Multi-Site is widely used for geo-replication and global business continuity.


๐Ÿง  4. Practical HA + DR Design for Proxmox + Ceph

Architecture Example

          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚     Proxmox Cluster A        โ”‚
          โ”‚  VM Storage โ†’ RBD (Ceph A)   โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
          RBD Mirror (Asynchronous Replication)
                      โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚     Proxmox Cluster B        โ”‚
          โ”‚  DR Storage โ†’ RBD (Ceph B)   โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuration Example

1๏ธโƒฃ Build two independent Ceph clusters.
2๏ธโƒฃ Enable mirroring on Cluster A:

rbd mirror pool enable vm-pool pool

3๏ธโƒฃ Register the peer on Cluster B:

rbd mirror pool peer add vm-pool client.admin@remote

4๏ธโƒฃ Promote the image during failover:

rbd mirror image promote vm-pool/vm-100-disk-0

โšก 5. Performance and Network Considerations

FactorRecommendation
Replication FrequencySnapshot-based incremental sync every 5โ€“15 minutes
Network Bandwidthโ‰ฅ 10 GbE dedicated link (VPN or MPLS for WAN)
Latency Tolerance50โ€“200 ms RTT (Async Mirror)
Failover PolicyManual or automated promotion
MonitoringCeph Dashboard + Prometheus + Alertmanager integration

๐Ÿ”’ 6. Governance and Reliability Best Practices

  • Deploy โ‰ฅ 3 MONs to maintain quorum and prevent split-brain.
  • Use CRUSH map rules to distribute replicas across racks or sites.
  • Enable Ceph Dashboard DR Module for replication health monitoring.
  • Integrate with Proxmox Backup Server (PBS) for multi-site backup sync.
  • Schedule regular failover/failback drills to verify readiness.

โœ… Conclusion

With its inherently distributed design, Ceph empowers enterprises to build
a highly available and geo-resilient storage backbone without relying on costly proprietary solutions.

By combining:

  • Replication / Erasure Coding
  • RBD Mirror / CephFS Mirror
  • RGW Multi-Site
  • Proxmox + PBS integration

organizations can achieve:

๐ŸŒ Self-healing, cross-site-synchronized, continuously available storage infrastructure

๐Ÿ’ฌ Coming next:
โ€œCeph Dashboard and Automated Monitoring Integration (Prometheus + Alertmanager)โ€ โ€”
how to build a unified observability platform with real-time visibility and proactive alerts.

Recent Posts

  • Postfix + Letโ€™s Encrypt + BIND9 + DANE Fully Automated TLSA Update Guide
  • Postfix + Letโ€™s Encrypt + BIND9 + DANE TLSA ๆŒ‡็ด‹่‡ชๅ‹•ๆ›ดๆ–ฐๅฎŒๆ•ดๆ•™ๅญธ
  • Deploying DANE in Postfix
  • ๅฆ‚ไฝ•ๅœจ Postfix ไธญ้ƒจ็ฝฒ DANE
  • DANE: DNSSEC-Based TLS Protection

Recent Comments

  1. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on High Availability Architecture, Failover, GeoDNS, Monitoring, and Email Abuse Automation (SOAR)
  2. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on MariaDB + PostfixAdmin: The Core of Virtual Domain & Mailbox Management
  3. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Daily Operations, Monitoring, and Performance Tuning for an Enterprise Mail System
  4. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Final Chapter: Complete Troubleshooting Guide & Frequently Asked Questions (FAQ)
  5. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Network Architecture, DNS Configuration, TLS Design, and Postfix/Dovecot SNI Explained

Archives

  • December 2025
  • November 2025
  • October 2025

Categories

  • AI
  • Apache
  • Cybersecurity
  • Database
  • DNS
  • Docker
  • Fail2Ban
  • FileSystem
  • Firewall
  • Linux
  • LLM
  • Mail
  • N8N
  • OpenLdap
  • OPNsense
  • PHP
  • QoS
  • Samba
  • Switch
  • Virtualization
  • VPN
  • WordPress
© 2025 Nuface Blog | Powered by Superbs Personal Blog theme