2.7

ManualsBrandsVMware ManualsApplicationsvFabric Data Director

181

182

183

184

185

186

187

188

189

190

Table Of Contents

VMware vFabric Data Director Administrator and User Guide

Table 14-2. Aurora_mon Parameters (Continued)

Parameter Description

app_stop_cmd (required) Command you use, such as any program, script, or executable file, to stop the

application. You use this command typically during system shutdown or when

restarting applications. The stop command is successful if it exits with a zero

exit code. The command must shut down the application cleanly (remove all

processes, files, locks, and so on) so that a subsequent start command executes

without problems. If the command does not complete in 300 seconds, it is

forcibly terminated.

If required, you can use the -o and –e options to have the aurora_mon daemon

capture stdout/stderr, otherwise it is redirected to /dev/null. To run the

command as a specified user, you must have an su –c wrapper or have set the

setuid bit of the application.

If you do not require a stop command, you can use a command that exits with

a zero exit code, for example /bin/true. An example of this is where the

application is monitoring the amount of disk space on a mount point. There is

no application to stop.

heartbeat_check_cmd (required) Command (any program, script, or executable) to check, the aliveness of the

application. The ping is successful (the application is considered alive) if the

command exits with zero exit code.

If required, you can use the -o and –e options to have the aurora_mon daemon

capture stdout/stderr, otherwise it is redirected to /dev/null. To run the

command as a specified user, you must have an su –c wrapper or have set the

setuid bit of the application.

heartbeat_period (optional, defaults to

30)

The time in seconds between each heartbeat ping (heart_check_cmd is issued

every heartbeat_period seconds). The value can be between 1 second and 600

seconds, and defaults to 30 seconds if not specified. A new heart_beat

command is not issued until the previous command finishes.

heartbeat_ignore_fail_count

(optional, defaults to 0)

Specifies the number of consecutive heartbeat_check_cmd failures, after

which the application is considered to have failed. For example, if

heartbeat_ignore_fail_count is 3, the application is considered to have

failed after a fourth consecutive heartbeat_check_cmd executes. The first three

failures are ignored. This reduces the possibility of a false positive due to

intermittent application problems or transient network problems that cause the

heartbeat_check_cmd to fail.

app_restart_retry_count (optional,

defaults to 3);

app_restart_retry_freq (optional,

defaults to 10 minutes)

The number of times aurora_mon attempts to restart an application after a

failure, and the period of time that elapses before aurora_mon attempts to restart

the application. For example, if app_restart_retry_count is 3 and

app_restart_retry _freq is 10 minutes, aurora_mon makes three attempts

to restart the application and waits 10 minutes before trying again.

heartbeat_fail_action (optional,

defaults to RESTART_APP)

The action taken when an application is considered to have failed (after

heartbeat_ignore_fail_count consecutive heartbeat_check_cmd

failures). The following values are acceptable:

JUST ALERT. Send alert only.

RESTART_APP. Restart the application (attempt

app_restart_retry_count times, and wait app_restart_retry_freq

time before you try again.

RESTART_VM. Restarts the virtual machine by stopping the virtual machine

app monitoring SDK heartbeat to the underlying VMware HA service. The

HA virtual machine properties of the cluster determine the virtual machine

restart interval and counts.

RESTART_APP_THEN_VM. Attempts to restart the application

app_restart_retry_count times. If the command fails to restart the

application, it resets the virtual machine using the Guest and HA

Application Monitoring SDK.

Chapter 14 Monitoring the Data Director Environment

VMware, Inc. 181