VERITAS Storage Foundation 4.1 Cluster File System HP Serviceguard Storage Management Suite Extracts, December 2005
Appendix A, Troubleshooting and Recovery
Installation Issues
127
Command Failures
◆ Manual pages not accessible with the man command. Set the MANPATH environment
variable as listed under “Setting PATH and MANPATH Environment Variables” on
page 15.
◆ The mount, fsck, and mkfs utilities reserve a shared volume. They fail on volumes
that are in use. Be careful when accessing shared volumes with other utilities such as
dd, it is possible for these commands to destroy data on the disk.
◆ Running some commands, such as fsadm -E /vol02, can generate the following
error message:
vxfs fsadm: ERROR: not primary in a cluster file system
This means that you can run this command only on the primary, that is, the system
that mounted this file system first.
Performance Issues
Quick I/O File system performance is adversely affected if a cluster file system is mounted
with the qio option enabled and Quick I/O is licensed, but the file system is not used for
Quick I/O files. Because qio is enabled by default, if you do not intend to use a shared
file system for Quick I/O, explicitly specify the noqio option when mounting.
High Availability Issues
Network Partition/Jeopardy
Network partition (or split brain) is a condition where a network failure can be
misinterpreted as a failure of one or more nodes in a cluster. If one system in the cluster
incorrectly assumes that another system failed, it may restart applications already running
on the other system, thereby corrupting data. CFS tries to prevent this by having
redundant heartbeat links.
At least one link must be active to maintain the integrity of the cluster. If all the links go
down, after the last network link is broken, the node can no longer communicate with
other nodes in the cluster. Thus the cluster is in one of two possible states. Either the last
network link is broken (called a network partition condition), or the last network link is
okay, but the node crashed, in which case it is not a network partition problem. It is not
possible to identify whether it is the first or second state, so a kernel message is issued to
indicate that a network partition may exist and there is a possibility of data corruption.