Optimizing Failover Time in a Serviceguard Environment, June 2007

cluster); cmhaltnode (to halt just this node); or the shutdown(1M) command (which will perform a
cmhaltnode before rebooting the node).
Applications
There is no single solution to optimizing the efficiency of the many different applications. However,
here are some general tips.
Failed-over applications may spend a long time recovering data. See if you can reduce this time.
Consider contacting the application vendor and the systems integrator for specific tuning tips.
If a database management system is used, consider implementing Oracle RAC. RAC significantly
reduces resource recovery after the failure of a RAC instance, so it helps reduce application recovery
time. To use Oracle RAC in a Serviceguard environment, purchase Serviceguard Extension for RAC.
Serviceguard Extension for Faster Failover
Further optimization of Serviceguard failover time is possible with Serviceguard Extension for Faster
Failover (SGeFF). This auxiliary product for Serviceguard on HP-UX is made for customers who need
faster failover and can configure a specific type of cluster.
The Serviceguard component of failure changes when Serviceguard Extension for Faster Failover is
enabled, as shown in Figure 5.
Figure 5: Steps in a failover with Serviceguard Extension for Faster Failover
Lock
acquisition
Quiescence Node
failure
detection
Cluster re-formation
Cluster component
recovery
Serviceguard component of failover time
Note: Diagram is not to scale.
SGeFF reduces the Serviceguard component of failover time by:
Eliminating the time ordinarily reserved for election
Shortening the time for cluster lock acquisition
Eliminating the time ordinarily required for local switching (see “Heartbeat Subnet” on page 10)
SGeFF cannot change the failover time of application-dependent components such as databases and
applications.
13