6.0.1
Table Of Contents
- vSphere Availability
- Contents
- About vSphere Availability
- Updated Information
- Business Continuity and Minimizing Downtime
- Creating and Using vSphere HA Clusters
- Providing Fault Tolerance for Virtual Machines
- Index
Virtual Machine Restart Notifications
vSphere HA generates a cluster event when a failover operation is in progress for virtual machines in the
cluster. The event also displays a configuration issue in the Cluster Summary tab which reports the number
of virtual machines that are being restarted. There are four different categories of such VMs.
n
VMs being placed: vSphere HA is in the process of trying to restart these VMs
n
VMs awaiting a retry: a previous restart attempt failed, and vSphere HA is waiting for a timeout to
expire before trying again.
n
VMs requiring additional resources: insufficient resources are available to restart these VMs. vSphere
HA retries when more resources become available, for example a host comes back online.
n
Inaccessible Virtual SAN VMs: vSphere HA cannot restart these Virtual SAN VMs because they are not
accessible. It retries when there is a change in accessibility.
These virtual machine counts are dynamically updated whenever a change is observed in the number of
VMs for which a restart operation is underway. The configuration issue is cleared when vSphere HA has
restarted all VMs or has given up trying.
In vSphere 5.5 or earlier, a per-VM event is triggered for an unsuccessful attempt to restart the virtual
machine. This event is disabled by default in vSphere 6.x and can be enabled by setting the vSphere HA
advanced option das.config.fdm.reportfailoverfailevent to 1.
VM and Application Monitoring
VM Monitoring restarts individual virtual machines if their VMware Tools heartbeats are not received
within a set time. Similarly, Application Monitoring can restart a virtual machine if the heartbeats for an
application it is running are not received. You can enable these features and configure the sensitivity with
which vSphere HA monitors non-responsiveness.
When you enable VM Monitoring, the VM Monitoring service (using VMware Tools) evaluates whether
each virtual machine in the cluster is running by checking for regular heartbeats and I/O activity from the
VMware Tools process running inside the guest. If no heartbeats or I/O activity are received, this is most
likely because the guest operating system has failed or VMware Tools is not being allocated any time to
complete tasks. In such a case, the VM Monitoring service determines that the virtual machine has failed
and the virtual machine is rebooted to restore service.
Occasionally, virtual machines or applications that are still functioning properly stop sending heartbeats. To
avoid unnecessary resets, the VM Monitoring service also monitors a virtual machine's I/O activity. If no
heartbeats are received within the failure interval, the I/O stats interval (a cluster-level attribute) is checked.
The I/O stats interval determines if any disk or network activity has occurred for the virtual machine during
the previous two minutes (120 seconds). If not, the virtual machine is reset. This default value (120 seconds)
can be changed using the advanced option das.iostatsinterval.
To enable Application Monitoring, you must first obtain the appropriate SDK (or be using an application
that supports VMware Application Monitoring) and use it to set up customized heartbeats for the
applications you want to monitor. After you have done this, Application Monitoring works much the same
way that VM Monitoring does. If the heartbeats for an application are not received for a specified time, its
virtual machine is restarted.
You can configure the level of monitoring sensitivity. Highly sensitive monitoring results in a more rapid
conclusion that a failure has occurred. While unlikely, highly sensitive monitoring might lead to falsely
identifying failures when the virtual machine or application in question is actually still working, but
heartbeats have not been received due to factors such as resource constraints. Low sensitivity monitoring
results in longer interruptions in service between actual failures and virtual machines being reset. Select an
option that is an effective compromise for your needs.
vSphere Availability
18 VMware, Inc.