3.1.2 Matrix Server Administration Guide
Chapter 19: Other Matrix Maintenance 253
Copyright © 1999-2006 PolyServe, Inc. All rights reserved.
2. If you want the virtual host to remain on the backup network interface
after the original server is returned to operation, make that network
interface the primary network interface. (Choose the virtual host from
the Virtual Hosts window, right-click, and select Properties.)
3. Perform the necessary maintenance on the original server and then
reenable it.
Detection of Down Servers
The ClusterPulse daemon uses heartbeats to determine whether a server
is up. At a specific interval, ClusterPulse sends a heartbeat message to
each server. This is called a “heartbeat event.” Each server is then
required to send a response back to ClusterPulse.
The suspect interval specifies the number of heartbeat events that can
pass without receiving a response from a server. If the server does not
respond within this interval, ClusterPulse determines that the server is
down.
The default value for the suspect interval is 34. If your matrix is focused
on SAN activities and servers are being reported as down during times of
high load average and/or high disk utilization, it may be useful to
increase this value. If your matrix is focused on services, such as HTTP or
FTP, you may want to decrease the value.
NOTE: Changing the suspect value can impact failure detection. If the
suspect interval is too high, ClusterPulse may not immediately
detect that a server is down. If the interval is too low and a server
does not have enough time to respond to a heartbeat, ClusterPulse
may incorrectly determine that the server is down.
To change the suspect interval, add a “clusterpulse_start_options” line
such as the following to the /etc/opt/polyserve/mxinit.conf file, where n is
the new value.
clusterpulse_start_options = { "-nodaemon", "-suspect n" };