Managing Serviceguard Eighteenth Edition, September 2010

ManualsBrandsHP ManualsSoftwareHP Serviceguard Quorum Software

431

432

433

434

435

436

437

438

439

440

coffee and don't come back for 28 minutes, the perceived downtime is actually 30

minutes, not 5. Factors to consider are the number of reconnection attempts to make,

the frequency of reconnection attempts, and whether or not to notify the user of

connection loss.

There are a number of strategies to use for client reconnection:

• Design clients which continue to try to reconnect to their failed server.

Put the work into the client application rather than relying on the user to reconnect.

If the server is back up and running in 5 minutes, and the client is continually

retrying, then after 5 minutes, the client application will reestablish the link with

the server and either restart or continue the transaction. No intervention from the

user is required.

• Design clients to reconnect to a different server.

If you have a server design which includes multiple active servers, the client could

connect to the second server, and the user would only experience a brief delay.

The problem with this design is knowing when the client should switch to the

second server. How long does a client retry to the first server before giving up and

going to the second server? There are no definitive answers for this. The answer

depends on the design of the server application. If the application can be restarted

on the same node after a failure (see “Handling Application Failures ” following),

the retry to the current server should continue for the amount of time it takes to

restart the server locally. This will keep the client from having to switch to the

second server in the event of a application failure.

• Use a transaction processing monitor or message queueing software to increase

robustness.

Use transaction processing monitors such as Tuxedo or DCE/Encina, which provide

an interface between the server and the client. Transaction processing monitors

(TPMs) can be useful in creating a more highly available application. Transactions

can be queued such that the client does not detect a server failure. Many TPMs

provide for the optional automatic rerouting to alternate servers or for the automatic

retry of a transaction. TPMs also provide for ensuring the reliable completion of

transactions, although they are not the only mechanism for doing this. After the

server is back online, the transaction monitor reconnects to the new server and

continues routing it the transactions.

• Queue Up Requests

As an alternative to using a TPM, queue up requests when the server is unavailable.

Rather than notifying the user when a server is unavailable, the user request is

queued up and transmitted later when the server becomes available again. Message

queueing software ensures that messages of any kind, not necessarily just

transactions, are delivered and acknowledged.

Message queueing is useful only when the user does not need or expect response

that the request has been completed (i.e, the application is not interactive).

438 Designing Highly Available Cluster Applications