Administrator Guide

Performance characterization
36 Dell EMC Ready Solution for HPC PixStor Storage | Document ID
PixStor Solution – NVMe Tier
This benchmarking was performed on four R640 NVMe nodes, each with eight Intel P4610 NVMe SSDs
arranged as eight NVMe over Fabric RAID10 using NVMesh, as previously described in this document. Such
RAID 10 were used as Block devices to create NSDs for data only, so the optional HDMD module (two R740)
but using a single ME4024 array was used to store all the metadata.
While this works pretty well as the results can show, The NSDs can store data and metadata, creating a self-
contained tier 0 where metadata could also benefit from the NVMe devices speed. Even more, if such a need
arises, the NVMe NSDs could be used for metadata only, to create an extreme demand metadata module.
The software versions use during the NVMe characterization are listed in Table 8.
Table 8 Software Components versions during characterization
Solution Component
Version at Characterization
Operating System
CentOS 7.7
Kernel version
3.10.0-1062.12.1.el7.x86_64
PixStor Software
5.1.3.1
Spectrum Scale (GPFS)
5.0.4-3
NVMesh
2.0.1
OFED Version
Mellanox OFED-5.0-2.1.8.0
Sequential IOzone Performance N clients to N files
Sequential N clients to N files performance was measured with IOzone version 3.487. Tests executed varied
from single thread up to 1024 threads in increments of powers of two.
Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files bigger than
two times that size. It is important to notice that for GPFS that tunable sets the maximum amount of memory
used for caching data, regardless the amount of RAM installed and free. Also, important to notice is that while
in previous DellEMC HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was
formatted with 8 MiB blocks and therefore that value is used on the benchmark for optimal performance. That
may look too large and apparently waste too much space, but GPFS uses subblock allocation to prevent that
situation. In the current configuration, each block was subdivided in 256 subblocks of 32 KiB each.
The following commands were used to execute the benchmark for writes and reads, where $Threads was the
variable with the number of threads used (1 to 1024 incremented in powers of two), and threadlist was the file
that allocated each thread on a different node, using round robin to spread them homogeneously across the
16 compute nodes.
To avoid any possible data caching effects from the clients, the total data size of the files was twice the total
amount of RAM in the clients used. That is, since each client has 128 GiB of RAM, for threads counts equal or
above 16 threads the file size was 4096 GiB divided by the number of threads (the variable $Size below was
used to manage that value). For those cases with less than 16 threads (which implies each thread was
running on a different client), the file size was fixed at twice the amount of memory per client, or 256 GiB.
./iozone -i0 -c -e -w -r 8M -s ${Size}G -t $Threads -+n -+m ./threadlist
./iozone -i1 -c -e -w -r 8M -s ${Size}G -t $Threads -+n -+m ./threadlist