Skip to content

Nuface Blog

้šจๆ„้šจๆ‰‹่จ˜ Casual Notes

Menu
  • Home
  • About
  • Services
  • Blog
  • Contact
  • Privacy Policy
  • Login
Menu

Building a Proxmox Cluster and High Availability (HA) Architecture

Posted on 2025-10-312025-10-31 by Rico

๐Ÿ”ฐ Introduction

After setting up and running Proxmox VE, the next step for IT teams is to enable automatic failover and centralized management across multiple servers.
This is achieved through Proxmox Cluster and its built-in High Availability (HA) framework.

Unlike VMwareโ€™s vCenter, DRS, or SRM (which require paid licensing), Proxmox provides these features natively and open-source, using Corosync for communication and pmxcfs for configuration synchronization.

In this article, weโ€™ll explore:
1๏ธโƒฃ How Proxmox clustering works
2๏ธโƒฃ How to build and verify a cluster
3๏ธโƒฃ How to configure and test automatic HA failover


๐Ÿงฉ 1. Understanding Proxmox Cluster Architecture

๐Ÿงฑ Concept Overview

A Proxmox Cluster is a group of nodes (servers) managed under a single configuration domain.
It allows all nodes to share:

  • VM and container definitions
  • Storage configurations (ZFS / NFS / Ceph)
  • User accounts and permissions
  • HA and scheduling information

Each node runs a pmxcfs (Proxmox Cluster File System) service that keeps configurations synchronized through Corosync messaging.


๐Ÿงฉ Cluster Architecture Diagram

          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚      Proxmox Cluster     โ”‚
          โ”‚ (pmxcfs + Corosync Layer)โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚          โ”‚           โ”‚
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚ Node01 โ”‚ โ”‚ Node02 โ”‚ โ”‚ Node03 โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚             โ”‚            โ”‚
        Shared Storage (NFS / Ceph / ZFS Replication)

โš™๏ธ Key Components

ComponentRoleDescription
CorosyncCommunication layerProvides real-time messaging and quorum control between nodes.
pmxcfsCluster file systemSynchronizes configuration files across nodes.
pve-clusterManagement layerHandles cluster control commands.
HA Manager (pve-ha-manager)Control layerMonitors VM states and performs automatic failover.

โš™๏ธ 2. Building a Cluster

Example Environment

NodeIP AddressRole
pve-node01192.168.10.11Cluster creator
pve-node02192.168.10.12Member node
pve-node03192.168.10.13Member node

All nodes:

  • Are running Proxmox VE 9.0.10
  • Use static IPs in the same network
  • Have synchronized time via NTP

Step 1 โ€“ Create the Cluster (on Node01)

pvecm create mycluster

This initializes a new cluster called mycluster and generates /etc/pve/corosync.conf.


Step 2 โ€“ Join Additional Nodes

On Node02 and Node03, run:

pvecm add 192.168.10.11

Enter the root password of the first node when prompted.
The nodes will automatically synchronize and join the cluster.


Step 3 โ€“ Verify Cluster Status

pvecm status

Expected output:

Cluster information
-------------------
Name:             mycluster
Config Version:   3
Nodes:            3
Quorum:           3
Votequorum:       2

๐Ÿ’ก Quorum ensures that more than half of the nodes are online before cluster actions (like HA migration) occur, preventing split-brain issues.


๐Ÿง  3. Understanding and Configuring High Availability (HA)

What Is HA in Proxmox?

Proxmox HA ensures that virtual machines (VMs or containers) are automatically restarted on another node if one fails.

How it works:

  • The pve-ha-manager continuously monitors all HA-enabled VMs.
  • If Corosync detects a node failure, the pve-ha-crm process reassigns and restarts affected VMs on another node.
  • HA configurations are shared and synchronized within the cluster.

Step 1 โ€“ Enable HA Services

Run on all nodes:

systemctl enable pve-ha-lrm pve-ha-crm --now

Step 2 โ€“ Add VM to HA Group

In the Proxmox Web GUI:

  1. Select a VM โ†’ Resources โ†’ HA โ†’ Add
  2. Choose or create an HA group (e.g., โ€œdefaultโ€)
  3. Save the configuration

Or via CLI:

ha-manager add vm:101

Step 3 โ€“ Test HA Failover

Simulate a node failure:

systemctl stop pve-cluster corosync

(or power off a node)

Within seconds, another node should automatically start the VM:

ha-manager status

Example output:

vm:101 state = started node=pve-node02

๐Ÿ—„๏ธ 4. Shared Storage for HA

To support live migration and HA failover, all nodes must have access to shared storage.

Storage TypeDescriptionBest For
NFS / iSCSICentralized storage via NAS or SAN.Small to medium clusters
ZFS ReplicationPeriodic replication of datasets between nodes.Low-cost redundancy
Ceph ClusterFully distributed storage integrated with Proxmox.Enterprise-scale clusters

๐Ÿ’ก Ceph is the ideal choice for scalable, fault-tolerant storage โ€” itโ€™s deeply integrated into Proxmox VE and supports data replication and self-healing.


โš™๏ธ 5. Best Practices for HA Environments

AspectRecommendation
Cluster SizeMinimum 3 nodes for stable quorum.
Corosync NetworkUse a dedicated VLAN or 10 GbE link for heartbeat traffic.
Power FencingUse IPMI-based fencing for enterprise reliability.
Backup StrategyEven with HA, use Proxmox Backup Server (PBS) for snapshot and offsite backups.

โœ… Conclusion

The Cluster and High Availability (HA) features in Proxmox VE are among its most powerful capabilities.
By leveraging Corosync, pmxcfs, and the built-in HA Manager, you can achieve an enterprise-grade virtualization environment
โ€” with redundancy, live migration, and automatic failover โ€” without the high licensing costs of proprietary platforms.

๐Ÿ’ฌ In the next article, weโ€™ll explore
โ€œProxmox Backup Server and Remote Replication Strategyโ€,
showing how to build a complete enterprise data protection solution.

Recent Posts

  • Postfix + Letโ€™s Encrypt + BIND9 + DANE Fully Automated TLSA Update Guide
  • Postfix + Letโ€™s Encrypt + BIND9 + DANE TLSA ๆŒ‡็ด‹่‡ชๅ‹•ๆ›ดๆ–ฐๅฎŒๆ•ดๆ•™ๅญธ
  • Deploying DANE in Postfix
  • ๅฆ‚ไฝ•ๅœจ Postfix ไธญ้ƒจ็ฝฒ DANE
  • DANE: DNSSEC-Based TLS Protection

Recent Comments

  1. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on High Availability Architecture, Failover, GeoDNS, Monitoring, and Email Abuse Automation (SOAR)
  2. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on MariaDB + PostfixAdmin: The Core of Virtual Domain & Mailbox Management
  3. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Daily Operations, Monitoring, and Performance Tuning for an Enterprise Mail System
  4. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Final Chapter: Complete Troubleshooting Guide & Frequently Asked Questions (FAQ)
  5. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Network Architecture, DNS Configuration, TLS Design, and Postfix/Dovecot SNI Explained

Archives

  • December 2025
  • November 2025
  • October 2025

Categories

  • AI
  • Apache
  • Cybersecurity
  • Database
  • DNS
  • Docker
  • Fail2Ban
  • FileSystem
  • Firewall
  • Linux
  • LLM
  • Mail
  • N8N
  • OpenLdap
  • OPNsense
  • PHP
  • QoS
  • Samba
  • Switch
  • Virtualization
  • VPN
  • WordPress
© 2025 Nuface Blog | Powered by Superbs Personal Blog theme