3.1.2 Matrix Server Administration Guide

Chapter 19: Other Matrix Maintenance 253

2. If you want the virtual host to remain on the backup network interface

after the original server is returned to operation, make that network

interface the primary network interface. (Choose the virtual host from

the Virtual Hosts window, right-click, and select Properties.)

3. Perform the necessary maintenance on the original server and then

reenable it.

Detection of Down Servers

The ClusterPulse daemon uses heartbeats to determine whether a server

is up. At a specific interval, ClusterPulse sends a heartbeat message to

each server. This is called a “heartbeat event.” Each server is then

required to send a response back to ClusterPulse.

The suspect interval specifies the number of heartbeat events that can

pass without receiving a response from a server. If the server does not

respond within this interval, ClusterPulse determines that the server is

down.

The default value for the suspect interval is 34. If your matrix is focused

on SAN activities and servers are being reported as down during times of

high load average and/or high disk utilization, it may be useful to

increase this value. If your matrix is focused on services, such as HTTP or

FTP, you may want to decrease the value.

NOTE: Changing the suspect value can impact failure detection. If the

suspect interval is too high, ClusterPulse may not immediately

detect that a server is down. If the interval is too low and a server

does not have enough time to respond to a heartbeat, ClusterPulse

may incorrectly determine that the server is down.

To change the suspect interval, add a “clusterpulse_start_options” line

such as the following to the /etc/opt/polyserve/mxinit.conf file, where n is

the new value.

clusterpulse_start_options = { "-nodaemon", "-suspect n" };