Cost-Effective High-Availability Solutions with HP Instant Capacity on HP-UX
Recovery from a failure involving one or more nPartitions
Rights seizure from a server where at least one other nPartition is still running is the simplest and most
straightforward case. In this situation, the Instant Capacity software treats the rights seizure in a
manner similar to a normal migration of usage rights because the software can contact the
still-running partition to commit the changes for the server. Unless an explicit restore operation is
performed, when the failed nPartition is rebooted it will have only the minimum number of core usage
rights left after the rights seizure (one core usage right per configured cell is the minimum necessary to
allow a reboot).
Rights seized in this situation do not expire. However, if the system is not rebooted within twelve to
twenty four hours of this rights seizure, the Instant Capacity software in still-running partitions takes
note that the iCAP daemon has not been running in the failed partition for a significant period of time.
The software assumes that the failed partition might be booted to an operating system that is unaware
of iCAP and will use all cores on the partition. This is referred to as the “assumed active” state and
can result in temporary capacity charges or compliance exceptions. To avoid this, cells can be made
inactive by removing them from the partition, shutting down the partition from within the OS using
shutdown -R -H, or with the MP RR command.
However, even if the cells are made inactive, the Instant Capacity software reserves usage rights to
minimize the possibility that the complex will be taken out of compliance if the partition is booted with
all cores active. (The usage rights reserved are the partition’s Intended Active value while the partition
is “recently active”, but after twelve to twenty four hours iCAP assumes reserved usage rights for all
configured cores regardless of Intended Active.) Unique considerations apply to this “assumed
reserved” state. In this state, additional activations may not be allowed unless temporary capacity is
authorized on the activation. Use one of the following methods to allow continued resource migrations
after the iCAP daemon puts the complex in the “assumed reserved” state:
• Reboot the failed partition if possible.
• Delete the failed partition.
• Reduce the number of cores considered active at next boot by setting the UONB (“use on next boot”)
value to false for one or more of the cells in the failed partition, or by removing one or more cells
from the partition. Note that powering off the cells does not change the “assumed reserved” state.
• Authorize the use of temporary capacity on requested activations (using the -t option). This allows
the reserved usage rights across the complex to exceed the number of actual usage rights.
Temporary capacity will be charged only if the number of active cores exceeds the number of actual
usage rights. If the failed partition remains inactive and the number of active cores does not exceed
the count of actual usage rights, temporary capacity will not be charged. Care must be taken with
this method. A reboot of the failed partition will activate at least one core per configured cell in the
newly booted partition and this could result in the number of active cores being large enough to
incur temporary capacity charges.
• Contact HP for additional methods.
Example: Manual failover from a partial outage of nPartitions
Consider the configuration shown in Figure 7. The GiCAP group consists of an active Group Manager,
ap1, and two servers, each with two partitions. The primary application runs on nPartition db1 and
requires eight active processor cores, so db1 is configured with usage rights for eight cores. nPartition
db2 has been configured with six inactive cores. No temporary capacity is being used in this group.
There are twelve inactive cores (cores without usage rights) in this configuration. Twelve GiCAP
sharing rights are required to create this group.
16