Cost-Effective High-Availability Solutions with HP Instant Capacity on HP-UX

Split groups and failback

In the case of a split group, both Group Managers have active status and each controls a subset of

the managed group members, depending on the individual member status at the time of the failover.

Control operations can be carried out on both active Group Managers, each communicating with the

members that it (and only it) controls. Groups and members can be added or removed on both Group

Managers (subject to the set of members each can command), and sharing rights can be added on

both. In some cases, this can be valuable; for example, when two data centers each remain functional

but some intervening network link has been broken. Each isolated set of systems can proceed with

independent disaster recovery operations within their group subset.

At some point, communication is restored and the split groups must be rejoined. This is accomplished

through issuing another icapmanage -Q command. It can be issued on either active Group Manager

to confirm that Group Manager as the active Group Manager and demote the other to standby status.

However, doing this loses all database changes made on the demoted Group Manager during the

time that the group was split. This includes the addition or removal of group members or whole

groups and the application of codewords on the demoted Group Manager and the group members it

controlled. There is no method to merge the two databases.

Recovering from a split group situation first requires deciding which of the two active Group Manager

databases is to become the only valid description of the groups controlled by the active Group

Manager. Having made this decision, the Group Manager with that target database must be made

the sole active Group Manager with the icapmanage -Q command issued on the target system, and

issued at a time when both Group Managers are accessible and can exchange information. Once

this has been done, the other Group Manager (now demoted to standby status) can take control of

the groups and members if this is desired, with a second icapmanage -Q command.

HP Serviceguard considerations

To automate the failover process, commands can be incorporated into Serviceguard package control

scripts. As described previously, there are two possible types of failover that can be automated:

• Resource shifting from one group member to another (using core usage rights seizure)

• Group control shifting to a standby Group Manager (using a take control command)

For simplicity, these are referred to as “member failover” and “Group Manager failover”,

respectively. Examples of each will be presented in later sections.

Performance implications

As with the previous Serviceguard solutions, application start-up time will be longer as compared to

using a typical Serviceguard package control script that does not invoke GiCAP commands. When

using GiCAP, the time required to activate a core in a GiCAP group can range from seconds to

minutes, depending on the size of the group, the hardware involved, and the network. The time to

perform a core usage rights seizure is less but has generally the same range.

In particular, end-to-end time for member failover in a Serviceguard/GiCAP environment consists of:

• Serviceguard failover time (no change; typically 28–45 seconds depending on cluster configuration

and Serviceguard version)

• Usage rights movement and activation (from seconds to a maximum of 10 minutes)

• Application startup and recovery time (no change; typically in minutes)