3.5.1 Matrix Server Administration Guide

Chapter 17: Advanced Monitor Topics 272
Copyright © 1999-2007 PolyServe, Inc. All rights reserved.
When the monitor executes the testpid script, it will first determine
whether the /var/run/application/pid file exists. If the file does not exist, the
script exits with a non-zero exit status, which the monitor interprets as a
failure.
If the file does exist, the script reads the pid from the file into the variable
pid. The kill command then determines whether the pid is running. The
exit status of the kill command is the exit status of the script.
If the kill command finds that the pid is running, it will exit with status 0,
and the script will exit with status 0. The monitor will interpret the 0 exit
status as “success” and will signal to the matrix that the application is up.
If the kill command finds that the pid is not running, it will exit with a
non-zero status, and the script will exit with that same status. The
monitor will interpret that exit status as “failure,” which will signal the
monitor that the application is down. Matrix Server will then take the
action configured for the service monitor, which is typically to fail over
the virtual host associated with the monitor.
When you create the custom service or device monitor for the probe
script, you can set both the frequency at which the probe script should be
executed and the timeout period, which is the maximum amount of time
that the monitor_agent daemon will wait for the probe to complete.
You can create more elaborate probe scripts as necessary. The key points
are to check whether the service or device is up and then to return a
corresponding exit status. The service or device monitor uses only the
exit status to determine whether the probe succeeded or failed, with 0
indicating success and any other value indicating failure.
Recovery Scripts
A Recovery script runs after a monitor probe fails. The script attempts to
restore the service and prevent failover of the virtual host(s) associated
with the monitor.
Recovery scripts are useful if there is an automatic way to recover from a
common failure mode for an application. For example, if you are
monitoring an application called myservice that is normally started at
boot time, but which is buggy and crashes occasionally, you could use a