6.0.1

ManualsBrandsVMware ManualsApplicationsvCenter Server

Table Of Contents

vSphere Availability

Virtual Machine Restart Notifications

vSphere HA generates a cluster event when a failover operation is in progress for virtual machines in the

cluster. The event also displays a configuration issue in the Cluster Summary tab which reports the number

of virtual machines that are being restarted. There are four different categories of such VMs.

VMs being placed: vSphere HA is in the process of trying to restart these VMs

VMs awaiting a retry: a previous restart attempt failed, and vSphere HA is waiting for a timeout to

expire before trying again.

VMs requiring additional resources: insufficient resources are available to restart these VMs. vSphere

HA retries when more resources become available, for example a host comes back online.

Inaccessible Virtual SAN VMs: vSphere HA cannot restart these Virtual SAN VMs because they are not

accessible. It retries when there is a change in accessibility.

These virtual machine counts are dynamically updated whenever a change is observed in the number of

VMs for which a restart operation is underway. The configuration issue is cleared when vSphere HA has

restarted all VMs or has given up trying.

In vSphere 5.5 or earlier, a per-VM event is triggered for an unsuccessful attempt to restart the virtual

machine. This event is disabled by default in vSphere 6.x and can be enabled by setting the vSphere HA

advanced option das.config.fdm.reportfailoverfailevent to 1.

VM and Application Monitoring

VM Monitoring restarts individual virtual machines if their VMware Tools heartbeats are not received

within a set time. Similarly, Application Monitoring can restart a virtual machine if the heartbeats for an

application it is running are not received. You can enable these features and configure the sensitivity with

which vSphere HA monitors non-responsiveness.

When you enable VM Monitoring, the VM Monitoring service (using VMware Tools) evaluates whether

each virtual machine in the cluster is running by checking for regular heartbeats and I/O activity from the

VMware Tools process running inside the guest. If no heartbeats or I/O activity are received, this is most

likely because the guest operating system has failed or VMware Tools is not being allocated any time to

complete tasks. In such a case, the VM Monitoring service determines that the virtual machine has failed

and the virtual machine is rebooted to restore service.

Occasionally, virtual machines or applications that are still functioning properly stop sending heartbeats. To

avoid unnecessary resets, the VM Monitoring service also monitors a virtual machine's I/O activity. If no

heartbeats are received within the failure interval, the I/O stats interval (a cluster-level attribute) is checked.

The I/O stats interval determines if any disk or network activity has occurred for the virtual machine during

the previous two minutes (120 seconds). If not, the virtual machine is reset. This default value (120 seconds)

can be changed using the advanced option das.iostatsinterval.

To enable Application Monitoring, you must first obtain the appropriate SDK (or be using an application

that supports VMware Application Monitoring) and use it to set up customized heartbeats for the

applications you want to monitor. After you have done this, Application Monitoring works much the same

way that VM Monitoring does. If the heartbeats for an application are not received for a specified time, its

virtual machine is restarted.

You can configure the level of monitoring sensitivity. Highly sensitive monitoring results in a more rapid

conclusion that a failure has occurred. While unlikely, highly sensitive monitoring might lead to falsely

identifying failures when the virtual machine or application in question is actually still working, but

heartbeats have not been received due to factors such as resource constraints. Low sensitivity monitoring

results in longer interruptions in service between actual failures and virtual machines being reset. Select an

option that is an effective compromise for your needs.

vSphere Availability

18 VMware, Inc.