Building a Proxmox Cluster and High Availability (HA) Architecture

🔰 Introduction

After setting up and running Proxmox VE, the next step for IT teams is to enable automatic failover and centralized management across multiple servers.
This is achieved through Proxmox Cluster and its built-in High Availability (HA) framework.

Unlike VMware’s vCenter, DRS, or SRM (which require paid licensing), Proxmox provides these features natively and open-source, using Corosync for communication and pmxcfs for configuration synchronization.

In this article, we’ll explore:
1️⃣ How Proxmox clustering works
2️⃣ How to build and verify a cluster
3️⃣ How to configure and test automatic HA failover

🧩 1. Understanding Proxmox Cluster Architecture

🧱 Concept Overview

A Proxmox Cluster is a group of nodes (servers) managed under a single configuration domain.
It allows all nodes to share:

VM and container definitions
Storage configurations (ZFS / NFS / Ceph)
User accounts and permissions
HA and scheduling information

Each node runs a pmxcfs (Proxmox Cluster File System) service that keeps configurations synchronized through Corosync messaging.

🧩 Cluster Architecture Diagram

          ┌──────────────────────────┐
          │      Proxmox Cluster     │
          │ (pmxcfs + Corosync Layer)│
          └──────────────────────────┘
             │          │           │
       ┌────────┐ ┌────────┐ ┌────────┐
       │ Node01 │ │ Node02 │ │ Node03 │
       └────────┘ └────────┘ └────────┘
           │             │            │
        Shared Storage (NFS / Ceph / ZFS Replication)

⚙️ Key Components

Component	Role	Description
Corosync	Communication layer	Provides real-time messaging and quorum control between nodes.
pmxcfs	Cluster file system	Synchronizes configuration files across nodes.
pve-cluster	Management layer	Handles cluster control commands.
HA Manager (pve-ha-manager)	Control layer	Monitors VM states and performs automatic failover.

⚙️ 2. Building a Cluster

Example Environment

Node	IP Address	Role
pve-node01	192.168.10.11	Cluster creator
pve-node02	192.168.10.12	Member node
pve-node03	192.168.10.13	Member node

All nodes:

Are running Proxmox VE 9.0.10
Use static IPs in the same network
Have synchronized time via NTP

Step 1 – Create the Cluster (on Node01)

pvecm create mycluster

This initializes a new cluster called mycluster and generates /etc/pve/corosync.conf.

Step 2 – Join Additional Nodes

On Node02 and Node03, run:

pvecm add 192.168.10.11

Enter the root password of the first node when prompted.
The nodes will automatically synchronize and join the cluster.

Step 3 – Verify Cluster Status

pvecm status

Expected output:

Cluster information
-------------------
Name:             mycluster
Config Version:   3
Nodes:            3
Quorum:           3
Votequorum:       2

💡 Quorum ensures that more than half of the nodes are online before cluster actions (like HA migration) occur, preventing split-brain issues.

🧠 3. Understanding and Configuring High Availability (HA)

What Is HA in Proxmox?

Proxmox HA ensures that virtual machines (VMs or containers) are automatically restarted on another node if one fails.

How it works:

The pve-ha-manager continuously monitors all HA-enabled VMs.
If Corosync detects a node failure, the pve-ha-crm process reassigns and restarts affected VMs on another node.
HA configurations are shared and synchronized within the cluster.

Step 1 – Enable HA Services

Run on all nodes:

systemctl enable pve-ha-lrm pve-ha-crm --now

Step 2 – Add VM to HA Group

In the Proxmox Web GUI:

Select a VM → Resources → HA → Add
Choose or create an HA group (e.g., “default”)
Save the configuration

Or via CLI:

ha-manager add vm:101

Step 3 – Test HA Failover

Simulate a node failure:

systemctl stop pve-cluster corosync

(or power off a node)

Within seconds, another node should automatically start the VM:

ha-manager status

Example output:

vm:101 state = started node=pve-node02

🗄️ 4. Shared Storage for HA

To support live migration and HA failover, all nodes must have access to shared storage.

Storage Type	Description	Best For
NFS / iSCSI	Centralized storage via NAS or SAN.	Small to medium clusters
ZFS Replication	Periodic replication of datasets between nodes.	Low-cost redundancy
Ceph Cluster	Fully distributed storage integrated with Proxmox.	Enterprise-scale clusters

💡 Ceph is the ideal choice for scalable, fault-tolerant storage — it’s deeply integrated into Proxmox VE and supports data replication and self-healing.

⚙️ 5. Best Practices for HA Environments

Aspect	Recommendation
Cluster Size	Minimum 3 nodes for stable quorum.
Corosync Network	Use a dedicated VLAN or 10 GbE link for heartbeat traffic.
Power Fencing	Use IPMI-based fencing for enterprise reliability.
Backup Strategy	Even with HA, use Proxmox Backup Server (PBS) for snapshot and offsite backups.

✅ Conclusion

The Cluster and High Availability (HA) features in Proxmox VE are among its most powerful capabilities.
By leveraging Corosync, pmxcfs, and the built-in HA Manager, you can achieve an enterprise-grade virtualization environment
— with redundancy, live migration, and automatic failover — without the high licensing costs of proprietary platforms.

💬 In the next article, we’ll explore
“Proxmox Backup Server and Remote Replication Strategy”,
showing how to build a complete enterprise data protection solution.