Skip to content

Nuface Blog

้šจๆ„้šจๆ‰‹่จ˜ Casual Notes

Menu
  • Home
  • About
  • Services
  • Blog
  • Contact
  • Privacy Policy
  • Login
Menu

Proxmox Automated Disaster Recovery and Cloud Orchestration Implementation

Posted on 2025-10-312025-10-31 by Rico

๐Ÿ”ฐ Introduction

Traditional disaster recovery (DR) procedures often rely on manual intervention:
administrators receive alerts, log into systems, locate backups, and manually trigger restores.

While this approach works in theory, during real-world incidents โ€” such as data center outages or ransomware attacks โ€”
manual recovery is slow, inconsistent, and error-prone.

Modern enterprises are shifting to Automated Disaster Recovery (ADR),
where systems detect, respond, and recover automatically based on defined events and policies.

This article covers:
1๏ธโƒฃ The architecture and workflow of ADR
2๏ธโƒฃ Integrating Proxmox VE, PBS, and APIs with Ansible and Terraform
3๏ธโƒฃ Real-world automation examples for recovery orchestration


๐Ÿงฉ 1. Automated Disaster Recovery (ADR) Architecture Overview

Architecture Diagram

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚                 Monitoring Layer                     โ”‚
 โ”‚ Prometheus โ†’ Grafana โ†’ Alertmanager โ†’ Webhook/Slack   โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚  Event Trigger
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚            Orchestration Layer  โ”‚
 โ”‚ Ansible / Terraform / N8N / AWX โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚  Automated Execution
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚           Execution Layer       โ”‚
 โ”‚  Proxmox VE + PBS + API + Ceph  โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿง  2. ADR Workflow Concept

StageTrigger SourceActionTool
1๏ธโƒฃ Event DetectionPrometheus / GrafanaDetect node or PBS outageAlertmanager
2๏ธโƒฃ NotificationWebhook / Slack / EmailNotify administrators & automation systemAlertmanager / Webhook
3๏ธโƒฃ OrchestrationTrigger Playbook / ScriptExecute recovery workflowAnsible / Terraform
4๏ธโƒฃ Data RecoveryRestore from remote PBS / Cloud backupRecreate VMs, networks, and servicesProxmox API + PBS
5๏ธโƒฃ VerificationValidate recovery statusConfirm and report completionAPI + Grafana

โš™๏ธ 3. Proxmox API + Ansible Integration Example

Proxmoxโ€™s RESTful API exposes nearly all operations,
making it ideal for integration with Ansible to automate DR processes.

1๏ธโƒฃ Ansible Inventory and Variables

/etc/ansible/hosts

[pve_cluster]
pve-node01 ansible_host=10.0.0.11
pve-node02 ansible_host=10.0.0.12

Variable definitions:

proxmox_api_url: "https://10.0.0.11:8006/api2/json"
proxmox_user: "root@pam"
proxmox_token_id: "dr-automation"
proxmox_token_secret: "xxxxxxx"

2๏ธโƒฃ Automated Recovery Playbook

---
- name: Proxmox Automated VM Recovery
  hosts: localhost
  gather_facts: no
  tasks:
    - name: Restore VM from remote PBS
      uri:
        url: "{{ proxmox_api_url }}/nodes/pve-node02/qemu"
        method: POST
        headers:
          Authorization: "PVEAPIToken={{ proxmox_user }}!{{ proxmox_token_id }}={{ proxmox_token_secret }}"
        body_format: json
        body:
          vmid: 301
          restore: "pbs:remote-pbs/vm-301"
          unique: 1
          pool: "production"
      register: restore_result

    - name: Print restore job status
      debug:
        var: restore_result

This playbook can be triggered by Alertmanager, AWX, or Webhook,
automatically restoring a VM on a remote site โ€” without manual action.


โ˜๏ธ 4. Terraform Integration for Automated Rebuild

Terraform can automate infrastructure provisioning at remote or cloud DR sites.

Terraform Example

provider "proxmox" {
  pm_api_url = "https://10.0.0.11:8006/api2/json"
  pm_user    = "root@pam"
  pm_api_token_id = "dr-automation"
  pm_api_token_secret = "xxxxxxx"
}

resource "proxmox_vm_qemu" "dr_vm" {
  name        = "dr-webserver"
  target_node = "pve-node02"
  clone       = "ubuntu-template"
  cores       = 4
  memory      = 8192
  disk {
    size    = "40G"
    storage = "local-lvm"
  }
  network {
    bridge = "vmbr0"
  }
}

Execute the automated deployment:

terraform init
terraform apply -auto-approve

This process can automatically provision standby infrastructure
after PBS synchronization completes.


๐Ÿงฎ 5. Alertmanager + N8N Workflow Example

Workflow Diagram

[Prometheus] 
   โ†“
[Grafana Alert] 
   โ†“
[Alertmanager]
   โ†“  (Webhook Trigger)
[N8N / Ansible Playbook]
   โ†“
[Proxmox API โ†’ PBS โ†’ Restore VM]
   โ†“
[Slack Notification / Report Delivery]

Example Webhook Payload

When Prometheus detects a node failure, Alertmanager sends:

{
  "receiver": "proxmox-dr",
  "status": "firing",
  "alerts": [
    {
      "labels": {
        "alertname": "PVE_Node_Down",
        "instance": "pve-node01",
        "severity": "critical"
      },
      "annotations": {
        "description": "Proxmox node pve-node01 is unreachable"
      }
    }
  ]
}

N8N or Ansible parses this payload and automatically executes the DR workflow.


๐Ÿงฐ 6. Automated Validation and Reporting

After a recovery job completes, automatically verify VM status:

pvesh get /nodes/pve-node02/qemu/301/status/current

If the result is:

{"status":"running"}

The restore is successful โœ…

Then report back via API or notification:

โœ… VM 301 successfully restored from remote PBS and started.

๐Ÿง  7. Multi-Region Cloud Orchestration

Extend your DR automation beyond on-premises โ€”
deploy across multiple regions and clouds for full hybrid orchestration.

Cloud / RegionAutomated TaskTool
AWSLaunch temporary EC2 nodes and attach PBS S3 backupsTerraform / AWS CLI
AzureActivate Blob snapshot as backup sourceAzure Functions
GCPUse Cloud Run or Cloud Scheduler to trigger DR workflowsN8N / API
On-PremAutomatically restart or rebuild Proxmox nodesAnsible / API

โœ… Conclusion

Through the combination of Proxmox VE + PBS + API + Automation Tools,
enterprises can establish a fully autonomous disaster recovery system capable of:

  • Real-time failure detection
  • Automated VM and data recovery
  • Cross-site replication
  • Cloud-based orchestration

This framework not only reduces human error and response time
but also elevates the resilience and continuity of enterprise IT operations.

๐Ÿ’ฌ In the next and final article of the Proxmox Enterprise Series:
โ€œProxmox Enterprise Governance Framework and Best Practices,โ€
weโ€™ll consolidate virtualization, backup, security, and cloud strategies
into a complete enterprise-grade open virtualization governance blueprint.

Recent Posts

  • Postfix + Letโ€™s Encrypt + BIND9 + DANE Fully Automated TLSA Update Guide
  • Postfix + Letโ€™s Encrypt + BIND9 + DANE TLSA ๆŒ‡็ด‹่‡ชๅ‹•ๆ›ดๆ–ฐๅฎŒๆ•ดๆ•™ๅญธ
  • Deploying DANE in Postfix
  • ๅฆ‚ไฝ•ๅœจ Postfix ไธญ้ƒจ็ฝฒ DANE
  • DANE: DNSSEC-Based TLS Protection

Recent Comments

  1. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on High Availability Architecture, Failover, GeoDNS, Monitoring, and Email Abuse Automation (SOAR)
  2. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on MariaDB + PostfixAdmin: The Core of Virtual Domain & Mailbox Management
  3. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Daily Operations, Monitoring, and Performance Tuning for an Enterprise Mail System
  4. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Final Chapter: Complete Troubleshooting Guide & Frequently Asked Questions (FAQ)
  5. Building a Complete Enterprise-Grade Mail System (Overview) - Nuface Blog on Network Architecture, DNS Configuration, TLS Design, and Postfix/Dovecot SNI Explained

Archives

  • December 2025
  • November 2025
  • October 2025

Categories

  • AI
  • Apache
  • Cybersecurity
  • Database
  • DNS
  • Docker
  • Fail2Ban
  • FileSystem
  • Firewall
  • Linux
  • LLM
  • Mail
  • N8N
  • OpenLdap
  • OPNsense
  • PHP
  • QoS
  • Samba
  • Switch
  • Virtualization
  • VPN
  • WordPress
© 2025 Nuface Blog | Powered by Superbs Personal Blog theme