Managing Serviceguard Sixteenth Edition, March 2009

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

391

392

393

394

395

396

397

398

399

400

To prevent one node from inadvertently accessing disks being used by the application

on another node, HA software uses an exclusive access mechanism to enforce access

by only one node at a time. This exclusive access applies to a volume group as a whole.

Use Multiple Destinations for SNA Applications

SNA is point-to-point link-oriented; that is, the services cannot simply be moved to

another system, since that system has a different point-to-point link which originates

in the mainframe. Therefore, backup links in a node and/or backup links in other nodes

should be configured so that SNA does not become a single point of failure. Note that

only one configuration for an SNA link can be active at a time. Therefore, backup links

that are used for other purposes should be reconfigured for the primary mission-critical

purpose upon failover.

Avoid File Locking

In an NFS environment, applications should avoid using file-locking mechanisms,

where the file to be locked is on an NFS Server. File locking should be avoided in an

application both on local and remote systems. If local file locking is employed and the

system fails, the system acting as the backup system will not have any knowledge of

the locks maintained by the failed system. This may or may not cause problems when

the application restarts.

Remote file locking is the worst of the two situations, since the system doing the locking

may be the system that fails. Then, the lock might never be released, and other parts

of the application will be unable to access that data. In an NFS environment, file locking

can cause long delays in case of NFS client system failure and might even delay the

failover itself.

Restoring Client Connections

How does a client reconnect to the server after a failure?

It is important to write client applications to specifically differentiate between the loss

of a connection to the server and other application-oriented errors that might be

returned. The application should take special action in case of connection loss.

One question to consider is how a client knows after a failure when to reconnect to the

newly started server. The typical scenario is that the client must simply restart their

session, or relog in. However, this method is not very automated. For example, a

well-tuned hardware and application system may fail over in 5 minutes. But if users,

after experiencing no response during the failure, give up after 2 minutes and go for

coffee and don't come back for 28 minutes, the perceived downtime is actually 30

minutes, not 5. Factors to consider are the number of reconnection attempts to make,

the frequency of reconnection attempts, and whether or not to notify the user of

connection loss.

Restoring Client Connections 397