White Papers

Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th
Generation Servers
22
version
(4)
. In a healthy cluster, any failure event should be noted by the Red Hat cluster
management daemon and acted upon within minutes. Note that this is the failover time on the NFS
servers; the impact to the clients could be longer.
Multiple SAS link failures - simulated by disconnecting all SAS links between one Dell PowerEdge
R620 server and the Dell PowerVault MD3200 storage.
When all SAS links on the active server fail, the multipath daemon on the active server retries the
path to the storage based on the parameters configured in the multipath.conf file. This is set
to 150 seconds by default. After this process times out, the HA service will attempt to failover to
the passive server.
If the cluster service is unable to cleanly stop the LVM and the file system because of the broken
path, a watchdog script reboots the active server after five minutes. At this point the passive
server fences the active server, restart the HA service and provide the data path to the clients.
This failover can therefore take anywhere in the range of three to eight minutes.
Impact to clients
Clients mount the NFS file system exported by the server using the HA service IP. This IP is
associated with either an IPoIB or a 10 Gigabit Ethernet network interface on the NFS server. To
measure any impact on the client, the dd utility and the iozone benchmark were used to read and
write large files between the client and the file system. Component failures were introduced on
the server while the client was actively reading and writing data from the file system.
In all scenarios, it was observed that the client processes complete the read and write operations
successfully. As expected, the client processes take longer to complete if the process is actively
accessing data during a failover event. During the failover period when the data share is
temporarily unavailable, the client process was observed to be in an uninterruptible sleep state.
Depending on the characteristics of the client process, it can be expected to abort or sleep while
the NFS share is temporarily unavailable during the failover process. Any data that has already
been written to the file system will be available.
For read and write operations during the failover case, data correctness was successfully verified
using the checkstream utility.
5. NSS-HA Performance with Dell PowerEdge 12
th
generation servers
This section presents the results of the performance related tests conducted on the current NSS-HA
solution. All performance tests were performed in a failure free scenario to measure the maximum
capability of the solution. The tests focused on three types of IO patterns: large sequential reads and
writes, small random reads and writes, and three metadata operations (file create, stat, and remove).
A 288TB configuration is benchmarked with IPoIB network connectivity. The 64-node compute cluster
described in Test bed was used to generate IO load to the NSS-HA solution. Each test was run over a
range of clients to test the scalability of the solution.
The iozone and mdtest utilities were used in this study. Iozone was used for the sequential and
random tests. For sequential tests, a request size of 1024KB was used. The total amount of data
transferred was 256GB to ensure that the NFS server cache was saturated. Random tests used a 4KB
request size and each client read and wrote a 2GB file. Metadata tests were performed using the