Administrator Guide

Performance characterization
32 Dell EMC Ready Solution for HPC PixStor Storage | Document ID
Figure 17 N to N Random Performance
From the results we can observe that write performance starts at a high value of 29.1K IOps and rises
steadily up to 64 threads where it seems to reach a plateau at around 40K IOps. Read performance on the
other hand starts at 1.4K IOps and increases performance almost linearly with the number of clients used
(keep in mind that number of threads is doubled for each data point) and reaches the maximum performance
of 25.6K IOPS at 64 threads where seems to be close to reaching a plateau. Using more threads will require
more than the 16 compute nodes to avoid resources starvation and a lower apparent performance, where the
arrays could in fact maintain the performance.
Metadata performance with MDtest using empty files
Metadata performance was measured with MDtest version 3.3.0, assisted by OpenMPI v4.0.1 to run the
benchmark over the 16 compute nodes. Tests executed varied from single thread up to 512 threads. The
benchmark was used for files only (no directories metadata), getting the number of creates, stats, reads and
removes the solution can handle, and results were contrasted with the Large size solution.
To properly evaluate the solution in comparison to other DellEMC HPC storage solutions and the previous
blog results, the optional High Demand Metadata Module was used, but with a single ME4024 array, even
that the large configuration and tested in this work was designated to have two ME4024s.
This High Demand Metadata Module can support up to four ME4024 arrays, and it is suggested to increase
the number of ME4024 arrays to 4, before adding another metadata module. Additional ME4024 arrays are
expected to increase the Metadata performance linearly with each additional array, except maybe for Stat
operations (and Reads for empty files), since the numbers are very high, at some point the CPUs will become
a bottleneck and performance will not continue to increase linearly.
The following command was used to execute the benchmark, where Threads was the variable with the
number of threads used (1 to 512 incremented in powers of two), and my_hosts.$Threads is the
corresponding file that allocated each thread on a different node, using round robin to spread them
homogeneously across the 16 compute nodes. Similar to the Random IO benchmark, the maximum number
of threads was limited to 512, since there are not enough cores for 1024 threads and context switching would
affect the results, reporting a number lower than the real performance of the solution.