Installation guide
Ensure that the resources required to run a given service are present on all nodes in the cluster
that may be required to run that service. For example, if your clustered service assumes a script
file in a specific location or a file system mounted at a specific mount point then you must ensure
that those resources are available in the expected places on all nodes in the cluster.
Ensure that failover domains, service dependency, and service exclusivity are not configured in
such a way that you are unable to migrate services to nodes as you'd expect.
If the service in question is a virtual machine resource, check the documentation to ensure that all
of the correct configuration work has been completed.
Increase the resource group manager's logging, as described in Section 9.6, “ Cluster Service Will
Not Start” , and then read the messages logs to determine what is causing the service start to fail
to migrate.
9.8. Each Node in a T wo-Node Clust er Report s Second Node Down
If your cluster is a two-node cluster and each node reports that it is up but that the other node is
down, this indicates that your cluster nodes are unable to communicate with each other via multicast
over the cluster heartbeat network. This is known as "split brain" or a "network partition." To address
this, check the conditions outlined in Section 9.2, “ Cluster Does Not Form” .
9.9. Nodes are Fenced on LUN Pat h Failure
If a node or nodes in your cluster get fenced whenever you have a LUN path failure, this may be a
result of the use of a quorum disk over multipathed storage. If you are using a quorum disk, and your
quorum disk is over multipathed storage, ensure that you have all of the correct timings set up to
tolerate a path failure.
9.10. Quorum Disk Does Not Appear as Clust er Member
If you have configured your system to use a quorum disk but the quorum disk does not appear as a
member of your cluster, check for the following conditions.
Ensure that you have set chkco nfi g o n for the q d i sk service.
Ensure that you have started the q d i sk service.
Note that it may take multiple minutes for the quorum disk to register with the cluster. This is
normal and expected behavior.
9.11. Unusual Failover Behavior
A common problem with cluster servers is unusual failover behavior. Services will stop when other
services start or services will refuse to start on failover. This can be due to having complex systems of
failover consisting of failover domains, service dependency, and service exclusivity. Try scaling back
to a simpler service or failover domain configuration and see if the issue persists. Avoid features like
service exclusivity and dependency unless you fully understand how those features may effect
failover under all conditions.
9.12. Fencing Occurs at Random
If you find that a node is being fenced at random, check for the following conditions.
Chapt er 9 . Diagnosing and Co rrect ing Problems in a Clust er
155