Implement high-availability solutions with HP Instant Capacity - easily and effectively

Split groups and failback

In the case of a split group, both Group Managers have active status and each controls a subset of the managed group

members, depending on the individual member status at the time of the failover. Control operations can be carried out

on both active Group Managers, each communicating with the members that it (and only it) controls. Groups and

members can be added or removed on both Group Managers (subject to the set of members each can command), and

sharing rights can be added on both. In some cases, this can be valuable; for instance, when two data centers each

remain functional but some intervening network link has been broken. Each isolated set of systems can proceed with

independent disaster recovery operations within their group subset.

At some point, communication is restored and the split groups must be rejoined. This is accomplished through

issuing another icapmanage -Q command. It can be issued on either active Group Manager to confirm that

Group Manager as the active Group Manager and demote the other to standby status. However, doing this loses all

database changes made on the demoted Group Manager during the time that the group was split. This includes the

addition or removal of group members or whole groups and the application of codeword on the demoted Group Manager

and the group members it controlled. There is no method to merge the two databases.

Recovering from a split group situation requires deciding which of the two active Group Manager databases is to

become the only valid description of the groups controlled by the active Group Manager. Having made this decision,

the Group Manager with that target database must be made the sole active Group Manager with the icapmanage –Q

command issued on the target system, and issued at a time when both Group Managers are accessible and can exchange

information. After this has been done, the other Group Manager (now demoted to standby status) can take control of the

groups and members, if this is desired, with a second icapmanage -Q command.

HP Serviceguard considerations

To automate the failover process, commands can be incorporated into Serviceguard package control scripts. As

described previously, there are two possible types of failover that can be automated:

• Resource shifting from one group member to another (using core usage rights seizure)

• Group control shifting to a standby Group Manager (using a take control command)

For simplicity, these are referred to as “member failover” and “Group Manager failover,” respectively. Examples of each

are presented in later sections.

Performance implications

As with the previous Serviceguard solutions, application startup time is longer as compared to using a typical

Serviceguard package control script that does not invoke GiCAP commands. When using GiCAP, the time required to

activate a core in a GiCAP group can range from seconds to minutes, depending on the size of the group, the hardware

involved, and the network. The time to perform a core usage rights seizure is less but has generally the same range.

In particular, end-to-end time for member failover in a Serviceguard/GiCAP environment consists of:

• Serviceguard failover time (no change; typically 28–45 seconds depending on cluster configuration and

Serviceguard version)

• Usage rights movement and activation (from seconds to a maximum of 10 minutes)

• Application startup and recovery time (no change; typically in minutes)

The time needed to move usage rights is based on the time to locate the appropriate Group Manager (if a standby

Group Manager is defined); the time to locate available usage rights, and then perform the icapmodify operations.

The time required by these operations depends on the number of:

• Blades

• nPars

• vPars

• Complexes

• Group Managers