HP 3PAR StoreServ Cluster Extension Software Administrator Guide

upon the customer setting, Cluster Extension is used to prevent resource groups from going online
automatically under the wrong conditions.
Cluster Extension will return local, data center-wide or even cluster-wide errors to prevent accidental
access to the resource group’s virtual volumes. HP does not recommend restarting a failed resource
group without investigating the problem. A failed Cluster Extension resource indicates the need to
check the status of the Remote Copy volume group and its member virtual volumes and decide
whether it is safe to continue or not.
HP 3PAR StoreServ Cluster Extension services, resources, or resource groups return a data center
error and fail the resource if the Remote Copy volume group status indicates that the problem
experienced locally would not be solved on another system connected to the same HP 3PAR
StoreServ Storage.
Depending on the resource group and resource property values, the resource tries to start on
different nodes several times. If the remote data center is down, this would look like the resource
group is alternating between the surviving systems. This happens until the previously mentioned
resource and resource group property values are reached or you disable the restarting of the
resource. This could be also the case if the ApplicationStartup resource property has been
set to FASTFAILBACK. If a 3PAR StoreServ Storage state has been discovered that does not allow
bringing the resource group online on any system in the cluster, a cluster error would be reported
and the resource would fail on all systems. This could lead to the same behavior as described for
a HP 3PAR StoreServ Cluster Extension data center error.
Failing physical disk resources during online attempt of the resource group
When resource groups that use HP 3PAR StoreServ Cluster Extension to failover Remote Copy
volume group are brought online, physical disk resources may fail due to the following reasons:
The physical disk resource does not have a dependency on its HP 3PAR StoreServ Cluster
Extension resources/packages configured. Review the setup steps for HP 3PAR StoreServ
Cluster Extension resources.
The fibre channel path or connectivity between the servers and the storage systems may be
broken. You have to review the FC connectivity between the servers and the storage systems.
If the storage array is brought back online or started after the array shutdown due to the
datacenter disaster or HP 3PAR OS upgrade, at times the status of the remote copy volume
group may go to the failsafe status as soon as the array is brought back online. The status
of the remote copy volume group is marked as failsafe by the HP 3PAR OS after the array
comes online and when the replication roles are primary at one side and primary-rev at the
other side. At this time, the physical disk resource may fail to come online in the Microsoft
Failover Cluster host whenever the cluster application role tries to come online on the server
cluster host which is connected to the rebooted storage array.
One of the scenarios to get in to the failsafe status can be explained as follows.
The replication roles for a remote copy volume group are primary in one datacenter (primary)
and secondary in the other datacenter (secondary) and the corresponding application in the
Microsoft Failover Cluster are online in the primary datacenter. In case a disaster such as
power outage happens in the primary datacenter, the application tries to failover to the Failover
Cluster host in the secondary datacenter. The application comes online successfully in the
Failover Cluster host in the secondary datacenter if the CLX property UseNonCurrentDataOk
is set to Yes. Once application comes online, the replication role in the secondary datacenter
turns to primary-rev from secondary.
Later, when the power outage in the primary datacenter is corrected and the storage array is
brought back online, the status of the remote copy volume group goes to the failsafe status.
At this time, if we failback the corresponding cluster application role from the secondary
datacenter to one of the hosts in the primary datacenter, then the physical disk resources may
fail to come online even though CLX resource has come online successfully.
88 Troubleshooting