Veritas Storage Foundation 5.1 SP1 Cluster File System Installation Guide (5900-1510, April 2011)

Performance issues
Quick I/O File system performance is adversely affected if a cluster file system is
mounted with the qio option enabled, but the file system is not used for Quick
I/O files. Because qio is enabled by default, if you do not intend to use a shared
file system for Quick I/O, explicitly specify the noqio option when mounting.
High availability issues
This section describes high availability issues.
Network partition and jeopardy
Network partition (or split brain) is a condition where a network failure can be
misinterpreted as a failure of one or more nodes in a cluster. If one system in the
cluster incorrectly assumes that another system failed, it may restart applications
already running on the other system, thereby corrupting data. CFS tries to prevent
this by having redundant heartbeat links.
At least one link must be active to maintain the integrity of the cluster. If all the
links go down, after the last network link is broken, the node can no longer
communicate with other nodes in the cluster. Thus the cluster is in one of two
possible states. Either the last network link is broken (called a network partition
condition), or the last network link is okay, but the node crashed, in which case
it is not a network partition problem. It is not possible to identify whether it is
the first or second state, so a kernel message is issued to indicate that a network
partition may exist and there is a possibility of data corruption.
Jeopardy is a condition where a node in the cluster has a problem connecting to
other nodes. In this situation, the link or disk heartbeat may be down, so a jeopardy
warning may be displayed. Specifically, this message appears when a node has
only one remaining link to the cluster and that link is a network link. This is
considered a critical event because the node may lose its only remaining connection
to the network.
Warning: Do not remove the communication links while shared storage is still
connected.
Low memory
Under heavy loads, software that manages heartbeat communication links may
not be able to allocate kernel memory. If this occurs, a node halts to avoid any
chance of network partitioning. Reduce the load on the node if this happens
frequently.
451Troubleshooting information
Storage Foundation Cluster File System problems