๐ฐ Introduction
After setting up and running Proxmox VE, the next step for IT teams is to enable automatic failover and centralized management across multiple servers.
This is achieved through Proxmox Cluster and its built-in High Availability (HA) framework.
Unlike VMwareโs vCenter, DRS, or SRM (which require paid licensing), Proxmox provides these features natively and open-source, using Corosync for communication and pmxcfs for configuration synchronization.
In this article, weโll explore:
1๏ธโฃ How Proxmox clustering works
2๏ธโฃ How to build and verify a cluster
3๏ธโฃ How to configure and test automatic HA failover
๐งฉ 1. Understanding Proxmox Cluster Architecture
๐งฑ Concept Overview
A Proxmox Cluster is a group of nodes (servers) managed under a single configuration domain.
It allows all nodes to share:
- VM and container definitions
- Storage configurations (ZFS / NFS / Ceph)
- User accounts and permissions
- HA and scheduling information
Each node runs a pmxcfs (Proxmox Cluster File System) service that keeps configurations synchronized through Corosync messaging.
๐งฉ Cluster Architecture Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Proxmox Cluster โ
โ (pmxcfs + Corosync Layer)โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ Node01 โ โ Node02 โ โ Node03 โ
โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
โ โ โ
Shared Storage (NFS / Ceph / ZFS Replication)
โ๏ธ Key Components
| Component | Role | Description |
|---|---|---|
| Corosync | Communication layer | Provides real-time messaging and quorum control between nodes. |
| pmxcfs | Cluster file system | Synchronizes configuration files across nodes. |
| pve-cluster | Management layer | Handles cluster control commands. |
| HA Manager (pve-ha-manager) | Control layer | Monitors VM states and performs automatic failover. |
โ๏ธ 2. Building a Cluster
Example Environment
| Node | IP Address | Role |
|---|---|---|
| pve-node01 | 192.168.10.11 | Cluster creator |
| pve-node02 | 192.168.10.12 | Member node |
| pve-node03 | 192.168.10.13 | Member node |
All nodes:
- Are running Proxmox VE 9.0.10
- Use static IPs in the same network
- Have synchronized time via NTP
Step 1 โ Create the Cluster (on Node01)
pvecm create mycluster
This initializes a new cluster called mycluster and generates /etc/pve/corosync.conf.
Step 2 โ Join Additional Nodes
On Node02 and Node03, run:
pvecm add 192.168.10.11
Enter the root password of the first node when prompted.
The nodes will automatically synchronize and join the cluster.
Step 3 โ Verify Cluster Status
pvecm status
Expected output:
Cluster information
-------------------
Name: mycluster
Config Version: 3
Nodes: 3
Quorum: 3
Votequorum: 2
๐ก Quorum ensures that more than half of the nodes are online before cluster actions (like HA migration) occur, preventing split-brain issues.
๐ง 3. Understanding and Configuring High Availability (HA)
What Is HA in Proxmox?
Proxmox HA ensures that virtual machines (VMs or containers) are automatically restarted on another node if one fails.
How it works:
- The pve-ha-manager continuously monitors all HA-enabled VMs.
- If Corosync detects a node failure, the pve-ha-crm process reassigns and restarts affected VMs on another node.
- HA configurations are shared and synchronized within the cluster.
Step 1 โ Enable HA Services
Run on all nodes:
systemctl enable pve-ha-lrm pve-ha-crm --now
Step 2 โ Add VM to HA Group
In the Proxmox Web GUI:
- Select a VM โ Resources โ HA โ Add
- Choose or create an HA group (e.g., โdefaultโ)
- Save the configuration
Or via CLI:
ha-manager add vm:101
Step 3 โ Test HA Failover
Simulate a node failure:
systemctl stop pve-cluster corosync
(or power off a node)
Within seconds, another node should automatically start the VM:
ha-manager status
Example output:
vm:101 state = started node=pve-node02
๐๏ธ 4. Shared Storage for HA
To support live migration and HA failover, all nodes must have access to shared storage.
| Storage Type | Description | Best For |
|---|---|---|
| NFS / iSCSI | Centralized storage via NAS or SAN. | Small to medium clusters |
| ZFS Replication | Periodic replication of datasets between nodes. | Low-cost redundancy |
| Ceph Cluster | Fully distributed storage integrated with Proxmox. | Enterprise-scale clusters |
๐ก Ceph is the ideal choice for scalable, fault-tolerant storage โ itโs deeply integrated into Proxmox VE and supports data replication and self-healing.
โ๏ธ 5. Best Practices for HA Environments
| Aspect | Recommendation |
|---|---|
| Cluster Size | Minimum 3 nodes for stable quorum. |
| Corosync Network | Use a dedicated VLAN or 10 GbE link for heartbeat traffic. |
| Power Fencing | Use IPMI-based fencing for enterprise reliability. |
| Backup Strategy | Even with HA, use Proxmox Backup Server (PBS) for snapshot and offsite backups. |
โ Conclusion
The Cluster and High Availability (HA) features in Proxmox VE are among its most powerful capabilities.
By leveraging Corosync, pmxcfs, and the built-in HA Manager, you can achieve an enterprise-grade virtualization environment
โ with redundancy, live migration, and automatic failover โ without the high licensing costs of proprietary platforms.
๐ฌ In the next article, weโll explore
โProxmox Backup Server and Remote Replication Strategyโ,
showing how to build a complete enterprise data protection solution.