Optimizing Failover Time in a Serviceguard Environment, June 2007

cluster); cmhaltnode (to halt just this node); or the shutdown(1M) command (which will perform a

cmhaltnode before rebooting the node).

Applications

There is no single solution to optimizing the efficiency of the many different applications. However,

here are some general tips.

Failed-over applications may spend a long time recovering data. See if you can reduce this time.

Consider contacting the application vendor and the systems integrator for specific tuning tips.

If a database management system is used, consider implementing Oracle RAC. RAC significantly

reduces resource recovery after the failure of a RAC instance, so it helps reduce application recovery

time. To use Oracle RAC in a Serviceguard environment, purchase Serviceguard Extension for RAC.

Serviceguard Extension for Faster Failover

Further optimization of Serviceguard failover time is possible with Serviceguard Extension for Faster

Failover (SGeFF). This auxiliary product for Serviceguard on HP-UX is made for customers who need

faster failover and can configure a specific type of cluster.

The Serviceguard component of failure changes when Serviceguard Extension for Faster Failover is

enabled, as shown in Figure 5.

Figure 5: Steps in a failover with Serviceguard Extension for Faster Failover

Lock

acquisition

Quiescence Node

failure

detection

Cluster re-formation

Cluster component

recovery

Serviceguard component of failover time

Note: Diagram is not to scale.

SGeFF reduces the Serviceguard component of failover time by:

• Eliminating the time ordinarily reserved for election

• Shortening the time for cluster lock acquisition

• Eliminating the time ordinarily required for local switching (see “Heartbeat Subnet” on page 10)

SGeFF cannot change the failover time of application-dependent components such as databases and

applications.