White Papers
Dell HPC Lustre Storage solution with Intel Omni-Path
We found that single client performance were consistent at 1GB/s to 1.3GB/s for writes and reads
respectively. The write and read performance rise sharply as we increase the number of process
threads up to 24 where we see level out to 256 with occasional dips. This is partially a result of
increasing the number of OSTs utilized, as the number of threads is increased (up to the 24 OSTs in our
system).
To maintain the higher throughput for an even greater number of files, increasing the number of OSTs
is likely to help. A review of the storage array performance using the tools provided by the Dell
PowerVault Modular Disk Storage Manager, Performance Monitor, was performed to independently
confirm the throughput values produced by the benchmarking tools.
There are various OS, Intel Omni-Path IFS and Lustre level tuning parameters that can be used to
optimize the Lustre storage servers for specific workloads. We cover the details of the tuning
parameters that were configured on the test system below.
4.2 Random Reads and Writes
The IOzone benchmark was used to gather random reads and writes metrics. The file size selected for
this testing was such that the aggregate size from all threads was consistently 1TB. That is, random
reads and writes have an aggregate size of 1TB divided equally among the number of threads within
that test. The IOzone host file is arranged to distribute the workload evenly across the compute nodes.
The storage is addressed as a single volume with a stripe count of 1 and stripe size of 4MB. A 4KB
request size is used because it aligns with Lustre’s 4KB file system block size and is representative of
small block accesses for a random workload. Performance is measured in I/O operations per second
(IOPS)
Figure 12 shows that the random writes peak at little shy of 15.6K IOPS and leveling after 48 threads,
while random reads show a steady incline as threads increase. The IOPs of random reads increase
rapidly from 4 to 32 threads and again at 72 to 256 threads. As the writes require a file lock per OST
accessed, saturation is expected. Reads take advantage of Lustre’s ability to grant overlapping read
extent locks for part or all of a file.