HP 3PAR Cluster Extension Software Administrator Guide for Windows (5697-1821, April 2012)

ManualsBrandsHP ManualsSoftwareHP Cluster Software

promote issue), replication I/O does not start from the new primary volumes to the new secondary

volumes. In case of Windows OS, CLX will continuously attempt to start the group during the

monitoring interval of the CLX resource. In case of RHEL and SUSE, manual start of the volume

group is necessary in order to resume the replication IO between the primary and secondary RC

volume groups.

Cluster Extension Autopass troubleshooting

Cluster Extension uses Autopass as a framework for licensing checks. Autopass provides Graphical

User Interface and a Command Line Interface to perform licensing specific operations, and they

are well integrated in to Cluster Extension. For the GUI, Autopass needs a compatible JRE version

installed on the system. For the supported JRE version refer to Cluster Extension SPOCK. In case

the GUI is not working due to environmental issues related to JRE, CLI can be used to perform the

licensing specific operations like install and uninstall.

The FC link is down (RHCS)

In RHCS, the detection of a storage outage due to failure of all paths to the storage depends on

the monitoring capability of resources configured in the RHCS service. For example, the LVM and

filesystem resource agents distributed with RHCS can detect the loss of storage and take appropriate

actions. The stop operation on a service might fail due to the inability to stop individual resources

cleanly. This may be caused by the loss of paths to the storage. When the stop operation on a

service fails, RHCS marks the service as failed and the service does not automatically fail over to

another node.

To recover from this situation, use the following procedure:

1. Remove the node that lost access to the storage by shutting down the node.

2. Follow the steps required to bring up a service in a failed state, as documented in the RHCS

administration guide. This process involves disabling the service, and then enabling it on the

node where the service is allowed to come online.

3. Restart the node that was shut down.

NOTE: The time to detect a storage outage due to failure of all paths to storage depends

on the setting for no_path_retry in the multipath software configuration. A value of fail

does not queue I/O in the event of a failure in all paths and returns an immediate failure. For

information about the recommended value for your environment, see the DM-Multipath

documentation.

Some resource agents, such as LVM, offer a mechanism called self_fence to take themselves

out of a cluster through node reboot when an underlying logical volume can no longer be

accessed. For supported options, see the RHCS documentation.

A storage replication link is down (RHCS)

If an HP 3PAR Cluster Extension configuration uses Remote Copy volume groups with failsafemode

enabled, the array disables access to the disk when it cannot replicate the I/O to the remote array.

In this situation, if a replication link is broken, the resource agents of configured resources, such

as lvm or fs, may be able to detect and take appropriate actions. The stop operation on a service

might fail due to the inability to stop individual resources cleanly because the disk is no longer

accessible for read/write operations. When the stop operation on a service fails, RHCS marks the

service as failed and the service does not automatically fail over to another node.

To recover from this situation, use the following procedure:

1. Remove the node that lost access to the storage by shutting down the node.

2. Follow the steps required to bring up a service in a failed state, as documented in the RHCS

administration guide. This process involves disabling the service, and then enabling it on the

node where the service is allowed to come online.

86 Troubleshooting