VM HA
Overview
HA Policy: HA Policy is a mechanism that ensures sustained and stable running of the business if virtual machine are unexpectedly stopped or are errored because of errors occurring to compute, network, or storage resources associated with the virtual machines. By enabling this feature, you can customize VM HA policies to ensure your business continuity and stability.
- Virtual Machine High Availability: Used to set whether virtual machines
automatically restart when they are shut down either planned or unexpectedly. If
the HA policy is not enabled on the platform, VM HA will take effect after HA
policy is enabled.
- If the high availability switch is turned off: Virtual machines will not automatically restart when they are shut down.
- If the high availability switch is turned on:
- Virtual machines will automatically restart when they are shut down plannedly or due to their own unexpected shutdown.
- If related compute, storage, network, etc., resources experience failures, the virtual machine will migrate to another host and HA start according to a custom-defined fault migration policy as needed.
- Virtual Machine High Availability Fault Migration Policy: Used to set whether to migrate a virtual machine to another host to start when related compute, storage, network, etc., resources experience faults.
Fault migration policies support detecting the status of the following resources:
- Management Network Connection Status:
- Detects the network connection status between the host where the virtual machine is located and the management node.
- If the management node itself fails or the management network is interrupted, it will result in a management network connection status failure.
- Storage Network Connection Status:
- Detects the network connection status between the virtual machine and the data storage resource where its system disk is located.
- If the data storage where the virtual machine’s system disk is located fails or the storage network is interrupted, it will result in a storage network connection status failure for the virtual machine.
- Business Network Card Status:
- If the business network card of the host associated with the distributed switch of the business virtual machine or the network port directly connected to the business network card of the switch fails, it will result in a business network card failure for the virtual machine.
Based on resource status detection, ZStack Cube Virtualization Edition provides four typical fault migration scenarios for easy configuration:Typical Scenario Management Network Connection Status Storage Network Connection Status Business NIC Status Migrate on Failure? Scenario A Normal Normal Failure Migrate / Do Not Migrate Scenario B Normal Failure Normal Migrate / Do Not Migrate Scenario C Normal Failure Failure Migrate / Do Not Migrate Scenario D Failure Normal Normal Do Not Migrate - Management Network Connection Status:
Use Cases
- Host Business Network Card Failure Scenario: You want all associated virtual machines to migrate to another host when the host business network card fails, ensuring high availability for your business.
- For example: Users deploy business virtual machines to host MySQL database services, requiring that virtual machines must not experience extended downtime. You can turn on the high availability switch for these virtual machines and set the business network card status failure to trigger migration. Assuming there are sufficient host resources within the platform, when the business network card of the host where the business virtual machine is located fails, the virtual machine will migrate to another host and start running, without affecting the business operations.
- Virtual Machine Unexpected Shutdown Scenario: You want virtual machines to automatically HA start when they unexpectedly shut down.
- For example: Users deploy business virtual machines to run critical company services, aiming to avoid situations where factors such as host power loss or virtual machine overload cause the virtual machine to shut down and the service cannot automatically recover. You can turn on the high availability switch for these virtual machines. When the virtual machine shuts down, the high availability mechanism will immediately restart it, ensuring business continuity.
Functionality Principles
- Polling to detect the operational status of virtual machines. If a virtual machine shuts down due to its own abnormal condition or planned shutdown, the system checks whether the high availability switch is turned on. If the switch is on, the virtual machine will be restarted on the current host or another host.
- Polling to detect the status of the host where the virtual machine is located. If any of the management network connection status, storage network connection status, or business network card status is abnormal, the system checks the virtual machine fault migration policy and the virtual machine high availability mode. If the corresponding fault migration switch is turned on and the virtual machine high availability switch is turned on, the virtual machine will migrate to another host and start running.
Benefits of the Functionality
- Comprehensive & Powerful: Covers all mainstream high availability scenarios, including various fault scenarios and shutdown scenarios. Ensures the stability and continuity of users' critical business through high availability mechanisms.
- Flexible & Visual: Provides an intuitive and simple scenario configuration table, supports one-click configuration of fault migration policies, combines global and virtual machine-level high availability configurations, which can greatly increase the flexibility of business high availability configurations.
HA Policy Basic Operations
Enable HA Policy
High availability policies in ZStack Cube Virtualization Edition are enabled by default. If they have been disabled, you can click on the , and then turn on the switch at the top of the HA Policy page to enable the high availability policies.
Set VM Failover Strategy
| Typical Scenario | Management Network Connection Status | Storage Network Connection Status | Business NIC Status | Migrate on Failure? | Migration Explanation |
|---|---|---|---|---|---|
| Scenario A | Normal | Normal | Failure | Migrate | Do Not Migrate | Supports setting to migrate or do not migrate. |
| Scenario B | Normal | Failure | Normal | Migrate | Do Not Migrate | Supports setting to migrate or do not migrate. However, in a SAN storage environment, if set to do not migrate here, the storage network connection status failure will still trigger automatic migration. |
| Scenario C | Normal | Failure | Failure | Migrate | Do Not Migrate | The migration strategy for when both the storage connection status and the business network card status fail follows the migration strategy for either status failing:
|
| Scenario D | Failure | Normal | Normal | Do Not Migrate | When the management network status is faulty, it is not supported to set a fault migration strategy. |
Note: Storage network connection status only supports detecting shared storage and does not currently support local storage.Set Host Error Detection
| Host Error Detection Item | Description |
|---|---|
| Host Self-Inspection Interval | The interval that a host inspects its own status. Default: 5. Unit: second. |
| Maximum Host Self-Inspection Attempts | The maximum number of attempts that a host inspects its own status. If the self-inspection of a host fails by the maximum attempts, it is determined that network errors occur with the host. Default: 6. |
Set Advanced Settings
| Category | Name | Description |
|---|---|---|
| Virtual Machine | HA VM State Update Speed | The speed of updating the state of NeverStop virtual machines on the UI. Default: 1. Valid values: -1 to 5. A higher value indicates a lower update speed. However, a lower update speed makes the system ignore a lot of outdated notifications, thus decreasing the system workload. If set to -1, the NeverStop VM states on the UI are not updated automatically. |
| Maximum Interval for VM Attempt to HA Start | The maximum interval for the system to finish the GC (garbage collection) job and attempt to restart a NeverStop VM according to the HA policy after the virtual machine is stopped unexpectedly. Default: 300. Unit: second. | |
| VM Retry HA Start Inverval | The interval for a Neverstop VM to retry an HA start after the previous HA start attempt fails. Default: 60. Unit: second. | |
| HA VM State Scanning Interval | The interval to scan the status of a NeverStop VM after it fails to HA start. Default: 60. Unit: second. | |
| Host | Timeout Period for Host Connecting to Data Storage | The time for hosts to attempt to connect to data storage. If a host fails to connect to a data storage during this period, its connection attempt is determined as timeout. Default: 5. Unit: second. |
| Abnormal Host Status Update Interval | The interval for the system to check and update the status of abnormal hosts. Default: 5. Unit: second. | |
| Minimum Connection Attempts Required to Determine Host is Disconnected | The maximum times for the system to attempt to connect to a host. If the system fails to connect to the host after the specified times of attempt, the host is determined as disconnected. Default: 12. | |
| Ping Response Time to Determine Host Connection is Established Successfully | The time period for the system to wait the host response after it pings the host. Receiving a response within this period indicates that the system establishes a successful connection with the host. Default: 5. Unit: second. | |
| Minimum Connection Success Rate to Determine Host is Re-Connected | The minimum rate of successful connections occupied in total connection attempts to determine a disconnected host is successfully re-connected. Default: 50. Unit: %. | |
| Minimum Successful Connections Required to Determine Host is Re-Connected | The minimum successful connections that the system has to establish with a disconnected host before the host can be determined as re-connected. Default: 5. |
View High Availability Logs
- Supports selecting a time period to view high availability logs for virtual machines during the selected period. Available time periods include: last 7 days, last month. By default, the latest 7 days of logs are displayed.
- Supports custom time periods to view high availability logs for virtual machines during the set period.
- Supports searching for high availability logs for virtual machines by entering the virtual machine name or owner.
- Supports filtering high availability logs for virtual machines by task result. Task results include: success, failure.
- Supports sorting high availability logs for virtual machines by start/completion time.
- Supports exporting high availability logs for virtual machines in CSV format.
- Supports adjusting the number of completed high availability logs for virtual machines displayed per page. Selectable values are: 10, 20, 50, 100, and pagination is supported.
Disable HA Policy
If you wish to globally disable the high availability feature for virtual machines, you can do so on the HA Policy page by clicking the Disable action.
Note: After disabling the high availability policy, virtual machines will not
automatically restart upon shutdown, which may cause service interruptions. Proceed
with caution.Implement HA Policy in Business Practices
Assume you have deployed four business virtual machines on Host A to support MySQL database services. To ensure high availability, if Host A's service network adapter fails, all four virtual machines should be migrated to another host. In this scenario, set the high availability mode for these virtual machines to NeverStop, configure the policy to trigger a migration when a service network adapter failure occurs, and ensure there are sufficient resources available on other hosts within the platform.
- Enable the high availability policy: In the ZStack Cube Virtualization Edition platform, click . Turn on the switch at the top of the HA Policy page.
- Turn on the high availability switch for virtual machines:You can set this up in two ways, with precedence order being: virtual machine level > cluster level.
- When creating a new virtual machine, turn on the HA switch to enable it.
- Go to the cluster page where the virtual machine resides, then click . Turn on the VM HA switch. This ensures that any new virtual machines created in this cluster will have the high availability switch turned on by default.
- Configure the VM fault migration policy: Navigate to the page and turn on the switch corresponding to Fail Over under Scenario A. When this switch is enabled, the Fail Over switch under Scenario C will automatically turn on as well.
After configuring the high availability policy for virtual machines, if Host A's service network adapter fails, the four virtual machines on that host will immediately migrate and start on Host B. You can search for related migration logs in .
