White Papers
Dell HPC Lustre Storage solution with Intel Omni-Path
4. Performance Evaluation and Analysis
The performance studies presented in this paper profile the capabilities of the Dell HPC Lustre Storage
with Intel EE for Lustre software, in a 240-drive configuration. The configuration has 240 – 4TB disk
drives (960TB raw space). The goal is to quantify the capabilities of the solution, points of peak
performance and the most appropriate methods for scaling. The client test bed used to provide I/O
workload to test the solution is the Dell Zenith system, a Top500 class system based on Intel Scalable
Systems Framework, with configuration as described in Table 1.
A number of performance studies were executed, stressing the configuration with different types of
workloads to determine the limitations of performance and define the sustainability of that
performance. Intel Omni-Path was used for these studies since its high speed and low latency allows
getting the maximum performance from the Dell HPC Lustre Storage solution, avoiding network
bottlenecks.
We generally try to maintain a “standard and consistent” testing environment and methodology. There
may be some areas where we purposely optimize server or storage configurations. We may have also
take measures to limit caching effects. The goal is to better illustrate the impact to performance. This
paper will detail the specifics of such configurations.
Table 1: Test Client Cluster Details
Component
Description
Compute Nodes:
Dell PowerEdge R630, 64 nodes
Node BIOS:
2.0.2
Processors:
Two Intel Xeon™ E5-2697 v4 @ 2.3GHz
Memory:
128GB DDR4 2400MHz
Interconnect:
Intel Omni-Path HFI
Lustre:
Lustre 2.7.15.3
OS:
Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)
Intel HFI
firmware and
driver:
10.1.1.06
Performance analysis was focused on three key performance markers:
Throughput, data sequentially transferred in GB/s.
I/O Operations per second (IOPS).
Metadata Operations per second (OP/s).
The goal is a broad but accurate review of the capabilities of the Dell HPC Lustre Storage with Intel EE
for Lustre. We selected two benchmarks to accomplish our goal: IOzone and MDtest.
We used N-to-N load to test, where every thread of the benchmark (N clients) writes to a different file
(N files) on the storage system. IOzone can be configured to use the N-to-N file-access method. For this
study, we use IOzone for N-to-N access method workloads. See Appendix A for examples of the
commands used to run these benchmarks.
Each set of tests was executed on a range of clients to test the scalability of the solution. The number
of simultaneous physical clients involved in each test varied from a single client to 64 clients. The
number of threads per node corresponds to the number of physical compute nodes, up to 64. The total