Managing Serviceguard Sixteenth Edition, March 2009

To prevent one node from inadvertently accessing disks being used by the application
on another node, HA software uses an exclusive access mechanism to enforce access
by only one node at a time. This exclusive access applies to a volume group as a whole.
Use Multiple Destinations for SNA Applications
SNA is point-to-point link-oriented; that is, the services cannot simply be moved to
another system, since that system has a different point-to-point link which originates
in the mainframe. Therefore, backup links in a node and/or backup links in other nodes
should be configured so that SNA does not become a single point of failure. Note that
only one configuration for an SNA link can be active at a time. Therefore, backup links
that are used for other purposes should be reconfigured for the primary mission-critical
purpose upon failover.
Avoid File Locking
In an NFS environment, applications should avoid using file-locking mechanisms,
where the file to be locked is on an NFS Server. File locking should be avoided in an
application both on local and remote systems. If local file locking is employed and the
system fails, the system acting as the backup system will not have any knowledge of
the locks maintained by the failed system. This may or may not cause problems when
the application restarts.
Remote file locking is the worst of the two situations, since the system doing the locking
may be the system that fails. Then, the lock might never be released, and other parts
of the application will be unable to access that data. In an NFS environment, file locking
can cause long delays in case of NFS client system failure and might even delay the
failover itself.
Restoring Client Connections
How does a client reconnect to the server after a failure?
It is important to write client applications to specifically differentiate between the loss
of a connection to the server and other application-oriented errors that might be
returned. The application should take special action in case of connection loss.
One question to consider is how a client knows after a failure when to reconnect to the
newly started server. The typical scenario is that the client must simply restart their
session, or relog in. However, this method is not very automated. For example, a
well-tuned hardware and application system may fail over in 5 minutes. But if users,
after experiencing no response during the failure, give up after 2 minutes and go for
coffee and don't come back for 28 minutes, the perceived downtime is actually 30
minutes, not 5. Factors to consider are the number of reconnection attempts to make,
the frequency of reconnection attempts, and whether or not to notify the user of
connection loss.
Restoring Client Connections 397