Cost-Effective High-Availability Solutions with HP Instant Capacity on HP-UX
The time needed to move usage rights is based on the time to locate the appropriate Group Manager
(if a standby Group Manager is defined), the time to locate available usage rights, and then perform
the icapmodify operations. The time required by these operations depends on the number of:
• nPartition cell boards
• nPartitions
• Virtual partitions
• Complexes
• Group Managers
End-to-end time for Group Manager failover consists of:
• Serviceguard failover time (no change; typically 28-45 seconds depending on cluster configuration
and Serviceguard version)
• The time needed for a standby to take control (from seconds to a maximum of 15 minutes)
The time needed for a standby Group Manager to take control depends on the number of:
• Groups
• Members in the groups
• Partitions in the groups that are not contactable
Remember that during failover to a standby Group Manager, the members of the group can still
operate normally with the usage rights they have, they just cannot borrow or lend usage rights within
the group.
Automation of member failover using core usage rights seizure
This is easier and more straightforward to implement in an nPartition environment.
When virtual partitions are involved, there are more complications due to the need to handle cases
where only a subset of virtual partitions in the nPartition have failed, and due to the likely requirement
that a manual intervention is needed to provide failback after any rights seizure operation.
Consider the case where a server goes down for a period of time that is long enough for automatic
failover to initiate a rights seizure on the nPartition containing virtual partitions—but before the restore
operation can be performed, the failing server starts an automatic reboot which “commits” the rights
seizure, and reduces the available usage rights. In this case, the reboot will likely fail until the virtual
partition assignments can be adjusted. The application can continue running on the failover node but
the server reboot will be blocked until an administrator can boot to nPar mode and make the virtual
partition adjustments.
Example: Automated (Serviceguard) member failover from a partial outage with nPartitions
The initial configuration, shown in Figure 15, includes a GiCAP group consisting of a single Group
Manager (no standby Group Manager) and two servers, each with two partitions. Partitions db1 and
db2 are also part of a Serviceguard cluster. db1 is defined to be the active node for the package,
while db2 is the adoptive (failover) node. The package requires eight active processor cores to run,
so db1 is configured with usage rights for eight cores. db2 has been configured with six inactive
cores. No temporary capacity is being used in this group.
There are 12 inactive cores (cores without usage rights) in this configuration. 12 GiCAP sharing rights
are configured to create this group.
31