White Papers
Table Of Contents
- Executive Summary (updated May 2011)
- 1. Introduction
- 2. Dell NFS Storage Solution Technical Overview
- 3. NFS Storage Solution with High Availability
- 4. Evaluation
- 5. Performance Benchmark Results (updated May 2011)
- 6. Comparison of the NSS Solution Offerings
- 7. Conclusion
- 8. References
- Appendix A: NSS-HA Recipe (updated May 2011)
- A.1. Pre-install preparation
- A.2. Server side hardware set-up
- A.3. Initial software configuration on each PowerEdge R710
- A.4. Performance tuning on the server
- A.5. Storage hardware set-up
- A.6. Storage Configuration
- A.7. NSS HA Cluster setup
- A.8. Quick test of HA set-up
- A.9. Useful commands and references
- A.10. Performance tuning on clients (updated May 2011)
- A.11. Example scripts and configuration files
- Appendix B: Medium to Large Configuration Upgrade
- Appendix C: Benchmarks and Test Tools
Dell HPC NFS Storage Solution - High Availability Configurations
Page 10
service impacting the availability of the entire system. In order to achieve better system
availability, the NSS-HA solution extends NSS using the Red Hat Cluster Suite (RHCS).
RHCS-based clustering includes a high availability feature. A cluster service is configured such that
a failure of any cluster member or cluster component does not interrupt the service provided by
the cluster. Utilizing this high availability feature, a “HA cluster” is constructed in NSS-HA. The HA
cluster consists of two PowerEdge R710 servers, and an HA service is defined which runs on one of
the servers. To ensure data integrity, the HA service must run only on one cluster server at any
time. If a failure occurs, the HA service will failover to the other PowerEdge R710 server while
keeping the whole process transparent to the clients of the cluster as much as possible.
In the HA context, the word “cluster” refers to the pair of PowerEdge R710 servers and the RHCS
software. This is distinct from the HPC cluster or the compute cluster which will be referred to as
the clients. The word “server” refers to the PowerEdge R710 servers that make up the HA cluster.
An HA service is defined by a group of one or more cluster resources. All resources must be
available for the cluster service to be up and running. When the service migrates from one server
to another, all the resources migrate. Once defined and configured, the cluster resources are
controlled solely by the HA cluster service and should not be manipulated outside of the HA cluster
constructs.
In NSS-HA, the HA service comprises the following resources:
1) LVM - The LVM specifies the logical volume managed by the HA service. The virtual disks
created on the storage arrays are grouped into a Linux logical volume. In NSS-HA, this LVM is
configured with HA to ensure only one server has access to the LVM at a time.
2) File system – The LVM is formatted as an XFS file system. User data resides on this XFS file
system. The HA service controls the mount options, mounting and unmounting of this file
system.
3) NFS export – The NFS export resource ensures that NFS daemons are running. The XFS file
system is exported to the clients over NFS. The HA service controls the NFS export options and
client access.
4) Service IP – An IP address is associated with the HA service. In NSS-HA, clients access and
mount the file system over NFS using this service IP. This IP is associated with the public
network interface on the server currently running the cluster service.
5) Link monitoring – Link monitoring checks the status of the public network interface to which
the service IP address is bound. A failed link will cause the cluster service to failover to the
other server. This is an important component of the HA cluster since a failure on the public
interface will prevent the clients from accessing the file system.
The cluster service can migrate or failover between the two servers in the HA cluster. But at any
given time, only one server owns the HA service. Before a server (named “active”) takes ownership
of the HA service, it must determine that the second server (named “passive”) is not running the
service. This is to ensure that data is protected and there is never a situation when both servers
are writing to the storage at the same time. The “active” server will start the HA service only if it