Managing Serviceguard A.11.20, March 2013

/usr/sbin/route delete net default 128.17.17.1 1 source 128.17.17.17
Once the per-interface default route(s) have been added, netstat rn would show something
like the following, where 128.17.17.17 is the package relocatable address and 128.17.17.19
is the physical address on the same subnet:
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 32808
128.17.17.19 128.17.17.19 UH 0 lan5 32808
128.17.17.17 128.17.17.17 UH 0 lan5:1 32808
192.168.69.82 192.168.69.82 UH 0 lan2 32808
128.17.17.0 128.17.17.19 U 3 lan5 1500
128.17.17.0 128.17.17.17 U 3 lan5:1 1500
192.168.69.0 192.168.69.82 U 2 lan2 1500
127.0.0.0 127.0.0.1 U 0 lo0 32808
default 128.17.17.1 UG 0 lan5:1 1500
default 128.17.17.1 UG 0 lan5 1500
NOTE: If your package has more than one relocatable address on a physical interface, you must
add a route statement for each relocatable address during package start up, and delete each of
these routes during package halt.
For more information about configuring modular, packages, see Chapter 6 (page 232); for legacy
packages, see “Configuring a Legacy Package” (page 307).
IMPORTANT: If you use a Quorum Server, make sure that you list all IP addresses or hostnames
by which the nodes communicate with the Quorum Server in the authorization file /etc/
cmcluster/qs_authfile.
For more information about the Quorum Server, see the latest version of the HP Serviceguard
Quorum Server Release Notes at http://www.hp.com/go/hpux-serviceguard-docs —> HP
Serviceguard Quorum Server Software.
Restoring Client Connections
How does a client reconnect to the server after a failure?
It is important to write client applications to specifically differentiate between the loss of a connection
to the server and other application-oriented errors that might be returned. The application should
take special action in case of connection loss.
One question to consider is how a client knows after a failure when to reconnect to the newly
started server. The typical scenario is that the client must simply restart their session, or relog in.
However, this method is not very automated. For example, a well-tuned hardware and application
system may fail over in 5 minutes. But if users, after experiencing no response during the failure,
give up after 2 minutes and go for coffee and don't come back for 28 minutes, the perceived
downtime is actually 30 minutes, not 5. Factors to consider are the number of reconnection attempts
to make, the frequency of reconnection attempts, and whether or not to notify the user of connection
loss.
There are a number of strategies to use for client reconnection:
Design clients which continue to try to reconnect to their failed server.
Put the work into the client application rather than relying on the user to reconnect. If the server
is back up and running in 5 minutes, and the client is continually retrying, then after 5 minutes,
the client application will reestablish the link with the server and either restart or continue the
transaction. No intervention from the user is required.
Design clients to reconnect to a different server.
If you have a server design which includes multiple active servers, the client could connect to
the second server, and the user would only experience a brief delay.
The problem with this design is knowing when the client should switch to the second server.
How long does a client retry to the first server before giving up and going to the second server?
There are no definitive answers for this. The answer depends on the design of the server
356 Designing Highly Available Cluster Applications