๐ฐ Introduction
As enterprise IT infrastructures evolve toward multi-site distribution and hybrid cloud environments,
maintaining data consistency across locations and ensuring rapid recovery in case of disaster
has become a mission-critical requirement.
Beyond its role as a local backup solution,
Proxmox Backup Server (PBS) can serve as the foundation for multi-site data resilience.
Through its built-in Sync Jobs, Remote Datastore Replication, and REST API automation,
PBS enables a complete, scalable, and automated Disaster Recovery (DR) framework
that ensures business continuity and long-term data reliability.
๐งฉ 1. Typical Multi-Site Architecture
Architecture Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Site A (Primary Site) โ
โ Proxmox VE + PBS + ZFS/NVMe โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ Incremental Sync
โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
โ Site B (DR / Secondary) โ
โ PBS + Ceph Object Storage โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ
Optional Cloud Tier Replication
โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
โ Site C (Cloud Archive) โ
โ S3 / Wasabi / B2 / GCP / Azureโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Three-Tier Model
1๏ธโฃ Primary Site (A) โ Production workloads and local PBS backups
2๏ธโฃ DR Site (B) โ Receives incremental PBS replication and supports failover
3๏ธโฃ Cloud Tier (C) โ Long-term archiving and offsite retention
โ๏ธ 2. Core Mechanism: PBS Sync Job
At the heart of PBS multi-site synchronization lies the Sync Job,
which performs incremental data replication between independent PBS servers.
Key Features
- Transfers only new data chunks (incremental)
- Supports compression and encrypted transport
- Allows automatic scheduling and retry policies
- Integrates with Verify Jobs for integrity validation
Example Sync Job Configuration
proxmox-backup-manager sync-job create \
--id sync-to-dr \
--source local-pbs \
--remote dr-pbs@192.168.10.20:8007 \
--store pbs-remote \
--schedule "daily"
๐ฆ This ensures the DR site always maintains up-to-date backup copies with minimal bandwidth usage.
๐ง 3. Cross-Site Verification and Data Consistency
To ensure remote data integrity,
PBS can combine Sync Jobs and Verify Jobs to execute a complete โReplicate โ Verify โ Reportโ cycle.
Automated Workflow
1๏ธโฃ Daily Incremental Sync โ Transfers new data chunks
2๏ธโฃ Weekly Verification โ Recomputes checksums and detects corruption
3๏ธโฃ Automated Reporting โ Sends results via API or email
Verify Job Example
proxmox-backup-manager verify start --store pbs-remote --all --jobs 4
โ If inconsistencies are found, PBS flags the affected chunks and queues them for re-sync.
โ๏ธ 4. Integrating with Cloud Object Storage
To achieve true offsite durability,
PBS can be extended with rclone or s3cmd to replicate data to public or private cloud storage platforms
such as AWS S3, Wasabi, Backblaze B2, or Google Cloud Storage.
Example: PBS โ Ceph RGW โ S3 Cloud Tier
rclone sync /mnt/datastore/pbs-remote ceph-s3:cloud-archive
Benefits:
- Three-layer protection (Local โ DR โ Cloud)
- Automated archival and versioning
- Integrated retention and lifecycle management
๐งฎ 5. Automated Failover and Restore Design
A. Automated Failover (Trigger-Based)
By integrating monitoring tools like Prometheus and Alertmanager,
you can automatically trigger DR operations when the primary PBS becomes unavailable:
- Switch Proxmox VE cluster backup target to DR PBS
- Enable write access on the DR datastore
- Notify administrators via webhook or email
B. Automated Restore via PBS API
PBS provides a full RESTful API for programmatic restores and re-deployments:
curl -k -X POST \
https://dr-pbs:8007/api2/json/admin/datastore/pbs-remote/restore \
-d "backup-type=vm&backup-id=vm-101&target=/mnt/vmrestore"
You can integrate this with Ansible, N8N, or Python automation scripts
to fully automate recovery or re-attach VM disks after a disaster.
๐ 6. Task Scheduling and Automation
Use systemd timers or cron jobs to orchestrate recurring maintenance tasks.
| Task | Frequency | Description |
|---|---|---|
| Sync Job | Daily | Incremental replication |
| Verify Job | Weekly | Data integrity check |
| Prune Job | Monthly | Cleanup of expired snapshots |
| Report Job | Weekly | Generate and email backup summary |
Example Automation Script
#!/bin/bash
proxmox-backup-manager sync-job run --id sync-to-dr
proxmox-backup-manager verify start --store pbs-remote --all
proxmox-backup-manager prune start --all
๐ 7. Monitoring and Observability
1๏ธโฃ Built-in Logging
- View task results under
/var/log/proxmox-backup/ - PBS Web UI displays Sync and Verify progress in real time
2๏ธโฃ External Monitoring Integration
Integrate Prometheus + Grafana + Wazuh for unified observability:
- Prometheus: collects PBS metrics (task duration, throughput, errors)
- Grafana: visualizes multi-site sync status dashboards
- Wazuh: provides audit logs, login tracking, and data change alerts
โ With this setup, PBS becomes not just a โbackup server,โ
but an observable and intelligent data protection platform.
๐งฉ 8. Real-World Deployment Scenarios
| Enterprise Setup | Implementation | Redundancy Level |
|---|---|---|
| Taipei HQ + Shanghai Branch | Site A: ZFS + PBS; Site B: PBS + Ceph | Dual-site replication |
| Taiwan HQ + Vietnam Plant + GCP Cloud | AโB replication, BโC cloud archival | Three-tier protection |
| On-Prem + DR IDC | Sync + Verify + Ansible-based failover | Automated DR |
| SME Single-Site | PBS + rclone to S3 | Cloud redundancy |
โ Conclusion
Proxmox Backup Server (PBS) has evolved far beyond a traditional local backup tool.
It now serves as a central component in enterprise-grade multi-site DR and automation frameworks.
Through:
- Incremental Sync Jobs for efficient cross-site replication
- Automated Verify and Repair cycles for data integrity
- REST API automation for instant restores
- Cloud-tier archival integration
Organizations can achieve a truly modern data strategy characterized by:
Multi-Site Redundancy ยท Automated Recovery ยท Verified Integrity ยท Cloud Governance
๐ฌ Coming next:
โAutomated Backup Orchestration with PBS + N8N + Ansibleโ โ
demonstrating how to integrate Proxmox Backup Server APIs
for fully automated synchronization, verification, restoration, and reporting workflows.