VERITAS Storage Foundation 4.1 Cluster File System HP Serviceguard Storage Management Suite Extracts, December 2005

Overview of Cluster Volume Management
98 Installation and Administration Guide
Overview of Cluster Volume Management
Tightly coupled cluster systems have become increasingly popular in enterprise-scale
mission-critical data processing. The primary advantage of clusters is protection against
hardware failure. If the primary node fails or otherwise becomes unavailable, applications
can continue to run by transferring their execution to standby nodes in the cluster. This
ability to provide continuous availability of service by switching to redundant hardware
is commonly termed failover.
Another major advantage of clustered systems is their ability to reduce contention for
system resources caused by activities such as backup, decision support and report
generation. Enhanced value can be derived from cluster systems by performing such
operations on lightly loaded nodes in the cluster instead of on the heavily loaded nodes
that answer requests for service. This ability to perform some operations on the lightly
loaded nodes is commonly termed load balancing.
To implement cluster functionality, VxVM works together with the cluster monitor daemon
that is provided by the host operating system. The cluster monitor informs
VxVM of changes in cluster membership. Each node starts up independently and has its
own cluster monitor plus its own copies of the operating system and VxVM with support
for cluster functionality. When a node joins a cluster, it gains access to shared disks. When
a node leaves a cluster, it no longer has access to shared disks. A node joins a cluster when
the cluster monitor is started on that node.
The figure “Example of a 4-Node Cluster” on page 99 illustrates a simple cluster
arrangement consisting of four nodes with similar or identical hardware characteristics
(CPUs, RAM and host adapters), and configured with identical software (including the
operating system). The nodes are fully connected by a private network and they are also
separately connected to shared external storage (either disk arrays or JBODs: just a bunch
of disks) via Fibre Channel. Each node has two independent paths to these disks, which are
configured in one or more cluster-shareable disk groups.
The private network allows the nodes to share information about system resources and
about each other’s state. Using the private network, any node can recognize which other
nodes are currently active, which are joining or leaving the cluster, and which have failed.
The private network requires at least two communication channels to provide
redundancy against one of the channels failing. If only one channel were used, its failure
would be indistinguishable from node failure—a condition known as network partitioning.