White Papers

Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th

Generation Servers

version

(4)

. In a healthy cluster, any failure event should be noted by the Red Hat cluster

management daemon and acted upon within minutes. Note that this is the failover time on the NFS

servers; the impact to the clients could be longer.

 Multiple SAS link failures - simulated by disconnecting all SAS links between one Dell PowerEdge

R620 server and the Dell PowerVault MD3200 storage.

When all SAS links on the active server fail, the multipath daemon on the active server retries the

path to the storage based on the parameters configured in the multipath.conf file. This is set

to 150 seconds by default. After this process times out, the HA service will attempt to failover to

the passive server.

If the cluster service is unable to cleanly stop the LVM and the file system because of the broken

path, a watchdog script reboots the active server after five minutes. At this point the passive

server fences the active server, restart the HA service and provide the data path to the clients.

This failover can therefore take anywhere in the range of three to eight minutes.

Impact to clients

Clients mount the NFS file system exported by the server using the HA service IP. This IP is

associated with either an IPoIB or a 10 Gigabit Ethernet network interface on the NFS server. To

measure any impact on the client, the dd utility and the iozone benchmark were used to read and

write large files between the client and the file system. Component failures were introduced on

the server while the client was actively reading and writing data from the file system.

In all scenarios, it was observed that the client processes complete the read and write operations

successfully. As expected, the client processes take longer to complete if the process is actively

accessing data during a failover event. During the failover period when the data share is

temporarily unavailable, the client process was observed to be in an uninterruptible sleep state.

Depending on the characteristics of the client process, it can be expected to abort or sleep while

the NFS share is temporarily unavailable during the failover process. Any data that has already

been written to the file system will be available.

For read and write operations during the failover case, data correctness was successfully verified

using the checkstream utility.

5. NSS-HA Performance with Dell PowerEdge 12

generation servers

This section presents the results of the performance related tests conducted on the current NSS-HA

solution. All performance tests were performed in a failure free scenario to measure the maximum

capability of the solution. The tests focused on three types of IO patterns: large sequential reads and

writes, small random reads and writes, and three metadata operations (file create, stat, and remove).

A 288TB configuration is benchmarked with IPoIB network connectivity. The 64-node compute cluster

described in Test bed was used to generate IO load to the NSS-HA solution. Each test was run over a

range of clients to test the scalability of the solution.

The iozone and mdtest utilities were used in this study. Iozone was used for the sequential and

random tests. For sequential tests, a request size of 1024KB was used. The total amount of data

transferred was 256GB to ensure that the NFS server cache was saturated. Random tests used a 4KB

request size and each client read and wrote a 2GB file. Metadata tests were performed using the