Administrator Guide

Performance characterization

32 Dell EMC Ready Solution for HPC PixStor Storage | Document ID

Figure 17 N to N Random Performance

From the results we can observe that write performance starts at a high value of 29.1K IOps and rises

steadily up to 64 threads where it seems to reach a plateau at around 40K IOps. Read performance on the

other hand starts at 1.4K IOps and increases performance almost linearly with the number of clients used

(keep in mind that number of threads is doubled for each data point) and reaches the maximum performance

of 25.6K IOPS at 64 threads where seems to be close to reaching a plateau. Using more threads will require

more than the 16 compute nodes to avoid resources starvation and a lower apparent performance, where the

arrays could in fact maintain the performance.

Metadata performance with MDtest using empty files

Metadata performance was measured with MDtest version 3.3.0, assisted by OpenMPI v4.0.1 to run the

benchmark over the 16 compute nodes. Tests executed varied from single thread up to 512 threads. The

benchmark was used for files only (no directories metadata), getting the number of creates, stats, reads and

removes the solution can handle, and results were contrasted with the Large size solution.

To properly evaluate the solution in comparison to other DellEMC HPC storage solutions and the previous

blog results, the optional High Demand Metadata Module was used, but with a single ME4024 array, even

that the large configuration and tested in this work was designated to have two ME4024s.

This High Demand Metadata Module can support up to four ME4024 arrays, and it is suggested to increase

the number of ME4024 arrays to 4, before adding another metadata module. Additional ME4024 arrays are

expected to increase the Metadata performance linearly with each additional array, except maybe for Stat

operations (and Reads for empty files), since the numbers are very high, at some point the CPUs will become

a bottleneck and performance will not continue to increase linearly.

The following command was used to execute the benchmark, where Threads was the variable with the

number of threads used (1 to 512 incremented in powers of two), and my_hosts.$Threads is the

corresponding file that allocated each thread on a different node, using round robin to spread them

homogeneously across the 16 compute nodes. Similar to the Random IO benchmark, the maximum number

of threads was limited to 512, since there are not enough cores for 1024 threads and context switching would

affect the results, reporting a number lower than the real performance of the solution.