Cost-Effective High-Availability Solutions with HP Instant Capacity on HP-UX

Split groups and failback
In the case of a split group, both Group Managers have active status and each controls a subset of
the managed group members, depending on the individual member status at the time of the failover.
Control operations can be carried out on both active Group Managers, each communicating with the
members that it (and only it) controls. Groups and members can be added or removed on both Group
Managers (subject to the set of members each can command), and sharing rights can be added on
both. In some cases, this can be valuable; for example, when two data centers each remain functional
but some intervening network link has been broken. Each isolated set of systems can proceed with
independent disaster recovery operations within their group subset.
At some point, communication is restored and the split groups must be rejoined. This is accomplished
through issuing another icapmanage -Q command. It can be issued on either active Group Manager
to confirm that Group Manager as the active Group Manager and demote the other to standby status.
However, doing this loses all database changes made on the demoted Group Manager during the
time that the group was split. This includes the addition or removal of group members or whole
groups and the application of codewords on the demoted Group Manager and the group members it
controlled. There is no method to merge the two databases.
Recovering from a split group situation first requires deciding which of the two active Group Manager
databases is to become the only valid description of the groups controlled by the active Group
Manager. Having made this decision, the Group Manager with that target database must be made
the sole active Group Manager with the icapmanage -Q command issued on the target system, and
issued at a time when both Group Managers are accessible and can exchange information. Once
this has been done, the other Group Manager (now demoted to standby status) can take control of
the groups and members if this is desired, with a second icapmanage -Q command.
HP Serviceguard considerations
To automate the failover process, commands can be incorporated into Serviceguard package control
scripts. As described previously, there are two possible types of failover that can be automated:
Resource shifting from one group member to another (using core usage rights seizure)
Group control shifting to a standby Group Manager (using a take control command)
For simplicity, these are referred to as “member failover” and “Group Manager failover”,
respectively. Examples of each will be presented in later sections.
Performance implications
As with the previous Serviceguard solutions, application start-up time will be longer as compared to
using a typical Serviceguard package control script that does not invoke GiCAP commands. When
using GiCAP, the time required to activate a core in a GiCAP group can range from seconds to
minutes, depending on the size of the group, the hardware involved, and the network. The time to
perform a core usage rights seizure is less but has generally the same range.
In particular, end-to-end time for member failover in a Serviceguard/GiCAP environment consists of:
Serviceguard failover time (no change; typically 28–45 seconds depending on cluster configuration
and Serviceguard version)
Usage rights movement and activation (from seconds to a maximum of 10 minutes)
Application startup and recovery time (no change; typically in minutes)
30