Administrator Guide

Performance characterization
29 Dell EMC Ready Solution for HPC PixStor Storage | Document ID
Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files bigger that
two times that size. It is important to notice that for GPFS that tunable sets the maximum amount of memory
used for caching data, regardless the amount of RAM installed and free. Also, important to notice is that while
in previous DellEMC HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was
formatted with 8 MiB blocks and therefore that value is used on the benchmark for optimal performance. That
may look too large and apparently waste too much space, but GPFS uses subblock allocation to prevent that
situation. In the current configuration, each block was subdivided in 256 subblocks of 32 KiB each.
The following commands were used to execute the benchmark for writes and reads, where Threads was the
variable with the number of threads used (1 to 1024 incremented in powers of two), and threadlist was the file
that allocated each thread on a different node, using round robin to spread them homogeneously across the
16 compute nodes.
./iozone -i0 -c -e -w -r 8M -s 128G -t $Threads -+n -+m ./threadlist
./iozone -i1 -c -e -w -r 8M -s 128G -t $Threads -+n -+m ./threadlist
Figure 15 N to N Sequential Performance
From the results we can observe that performance rises very fast with the number of clients used and then
reaches a plateau that is stable until the maximum number of threads that IOzone allow is reached, and
therefore large file sequential performance is stable even for 1024 concurrent clients. Notice that both read
and write performance benefited from the doubling the number of drives. The maximum read performance
was limited by the bandwidth of the two IB EDR links used on the storage nodes starting at 8 threads, and
ME4 arrays may have some extra performance available. Similarly notice that the maximum write
performance increased from a maximum of 16.7 to 20.4 GB/s at 64 and 128 threads and it is closer to the
ME4 arrays maximum specs (22 GB/s).