6.7
Table Of Contents
- vSphere Availability
- Contents
- About vSphere Availability
- Business Continuity and Minimizing Downtime
- Creating and Using vSphere HA Clusters
- Providing Fault Tolerance for Virtual Machines
- How Fault Tolerance Works
- Fault Tolerance Use Cases
- Fault Tolerance Requirements, Limits, and Licensing
- Fault Tolerance Interoperability
- Preparing Your Cluster and Hosts for Fault Tolerance
- Using Fault Tolerance
- Best Practices for Fault Tolerance
- Legacy Fault Tolerance
- Troubleshooting Fault Tolerant Virtual Machines
- Hardware Virtualization Not Enabled
- Compatible Hosts Not Available for Secondary VM
- Secondary VM on Overcommitted Host Degrades Performance of Primary VM
- Increased Network Latency Observed in FT Virtual Machines
- Some Hosts Are Overloaded with FT Virtual Machines
- Losing Access to FT Metadata Datastore
- Turning On vSphere FT for Powered-On VM Fails
- FT Virtual Machines not Placed or Evacuated by vSphere DRS
- Fault Tolerant Virtual Machine Failovers
- vCenter High Availability
- Plan the vCenter HA Deployment
- Configure the Network
- Configure vCenter HA With the Basic Option
- Configure vCenter HA With the Advanced Option
- Manage the vCenter HA Configuration
- Set Up SNMP Traps
- Set Up Your Environment to Use Custom Certificates
- Manage vCenter HA SSH Keys
- Initiate a vCenter HA Failover
- Edit the vCenter HA Cluster Configuration
- Perform Backup and Restore Operations
- Remove a vCenter HA Configuration
- Reboot All vCenter HA Nodes
- Change the Appliance Environment
- Collecting Support Bundles for a vCenter HA Node
- Troubleshoot Your vCenter HA Environment
- Patching a vCenter High Availability Environment
- Using Microsoft Clustering Service for vCenter Server on Windows High Availability
Host Failure Types
The master host of a VMware vSphere
®
High Availability cluster is responsible for detecting the failure of
subordinate hosts. Depending on the type of failure detected, the virtual machines running on the hosts
might need to be failed over.
In a vSphere HA cluster, three types of host failure are detected:
n
Failure. A host stops functioning.
n
Isolation. A host becomes network isolated.
n
Partition. A host loses network connectivity with the master host.
The master host monitors the liveness of the subordinate hosts in the cluster. This communication
happens through the exchange of network heartbeats every second. When the master host stops
receiving these heartbeats from a subordinate host, it checks for host liveness before declaring the host
failed. The liveness check that the master host performs is to determine whether the subordinate host is
exchanging heartbeats with one of the datastores. See Datastore Heartbeating. Also, the master host
checks whether the host responds to ICMP pings sent to its management IP addresses.
If a master host cannot communicate directly with the agent on a subordinate host, the subordinate host
does not respond to ICMP pings. If the agent is not issuing heartbeats, it is viewed as failed. The host's
virtual machines are restarted on alternate hosts. If such a subordinate host is exchanging heartbeats
with a datastore, the master host assumes that the subordinate host is in a network partition or is network
isolated. So, the master host continues to monitor the host and its virtual machines. See Network
Partitions.
Host network isolation occurs when a host is still running, but it can no longer observe traffic from
vSphere HA agents on the management network. If a host stops observing this traffic, it attempts to ping
the cluster isolation addresses. If this pinging also fails, the host declares that it is isolated from the
network.
The master host monitors the virtual machines that are running on an isolated host. If the master host
observes that the VMs power off, and the master host is responsible for the VMs, it restarts them.
Note If you ensure that the network infrastructure is sufficiently redundant and that at least one network
path is always available, host network isolation is less likely to occur.
Proactive HA Failures
A Proactive HA failure occurs when a host component fails, which results in a loss of redundancy or a
noncatastrophic failure. However, the functional behavior of the VMs residing on the host is not yet
affected. For example, if a power supply on the host fails, but other power supplies are available, that is a
Proactive HA failure.
vSphere Availability
VMware, Inc. 13