Administrator Guide

Performance characterization

36 Dell EMC Ready Solution for HPC PixStor Storage | Document ID

PixStor Solution – NVMe Tier

This benchmarking was performed on four R640 NVMe nodes, each with eight Intel P4610 NVMe SSDs

arranged as eight NVMe over Fabric RAID10 using NVMesh, as previously described in this document. Such

RAID 10 were used as Block devices to create NSDs for data only, so the optional HDMD module (two R740)

but using a single ME4024 array was used to store all the metadata.

While this works pretty well as the results can show, The NSDs can store data and metadata, creating a self-

contained tier 0 where metadata could also benefit from the NVMe devices speed. Even more, if such a need

arises, the NVMe NSDs could be used for metadata only, to create an extreme demand metadata module.

The software versions use during the NVMe characterization are listed in Table 8.

Table 8 Software Components versions during characterization

Solution Component

Version at Characterization

Operating System

CentOS 7.7

Kernel version

3.10.0-1062.12.1.el7.x86_64

PixStor Software

5.1.3.1

Spectrum Scale (GPFS)

5.0.4-3

NVMesh

2.0.1

OFED Version

Mellanox OFED-5.0-2.1.8.0

Sequential IOzone Performance N clients to N files

Sequential N clients to N files performance was measured with IOzone version 3.487. Tests executed varied

from single thread up to 1024 threads in increments of powers of two.

Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files bigger than

two times that size. It is important to notice that for GPFS that tunable sets the maximum amount of memory

used for caching data, regardless the amount of RAM installed and free. Also, important to notice is that while

in previous DellEMC HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was

formatted with 8 MiB blocks and therefore that value is used on the benchmark for optimal performance. That

may look too large and apparently waste too much space, but GPFS uses subblock allocation to prevent that

situation. In the current configuration, each block was subdivided in 256 subblocks of 32 KiB each.

The following commands were used to execute the benchmark for writes and reads, where $Threads was the

variable with the number of threads used (1 to 1024 incremented in powers of two), and threadlist was the file

that allocated each thread on a different node, using round robin to spread them homogeneously across the

16 compute nodes.

To avoid any possible data caching effects from the clients, the total data size of the files was twice the total

amount of RAM in the clients used. That is, since each client has 128 GiB of RAM, for threads counts equal or

above 16 threads the file size was 4096 GiB divided by the number of threads (the variable $Size below was

used to manage that value). For those cases with less than 16 threads (which implies each thread was

running on a different client), the file size was fixed at twice the amount of memory per client, or 256 GiB.

./iozone -i0 -c -e -w -r 8M -s ${Size}G -t $Threads -+n -+m ./threadlist

./iozone -i1 -c -e -w -r 8M -s ${Size}G -t $Threads -+n -+m ./threadlist