Administrator Guide

Performance characterization

20 Dell EMC Ready Solution for HPC PixStor Storage | Document ID

Number of Client nodes

Client node

Different 13G models with different CPUs and DIMMs

Cores per client node

10-22, Total = 492

Memory per client node

8x128 GiB & 8x256GB, Total = 3TiB

For testing, all nodes were counted as 256GiB (4 TiB).

CentOS 8.1

OS Kernel

4.18.0-147.el8.x86_64

PixStor Software

5.1.3.1

Spectrum Scale (GPFS)

5.0.4-3

OFED Version

Mellanox OFED 5.0-2.1.8.0

100 GbE

Connectivity

Adapter

Mellanox ConnectX-4 InfiniBand VPI EDR/100 GbE

Switch

DellEMC Z9100-ON

PixStor Solution with High Demand Meta-Data module

(no Capacity Expansion)

This initial benchmarking was using the large configuration (two R740 servers connected to four ME4084s)

with the optional HDMD module (two R740) but using a single ME4024 array instead of the two arrays a large

configuration would normally have. The software versions for were those before the release versions ,as

listed in Table 1 and Table 2.

Sequential IOzone Performance N clients to N files

Sequential N clients to N files performance was measured with IOzone version 3.487. Tests executed varied

from single thread up to 1024 threads.

Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files bigger that

two times that size. It is important to notice that for GPFS that tunable sets the maximum amount of memory

used for caching data, regardless the amount of RAM installed and free. Also, important to notice is that while

in previous DellEMC HPC solutions the block size for large sequential transfers is 1 MiB, GPFS was

formatted with 8 MiB blocks and therefore that value is used on the benchmark for optimal performance. That

may look too large and apparently waste too much space, but GPFS uses subblock allocation to prevent that

situation. In the current configuration, each block was subdivided in 256 subblocks of 32 KiB each.

The following commands were used to execute the benchmark for writes and reads, where Threads was the

variable with the number of threads used (1 to 1024 incremented in powers of two), and threadlist was the file

that allocated each thread on a different node, using round robin to spread them homogeneously across the

16 compute nodes.

./iozone -i0 -c -e -w -r 8M -s 128G -t $Threads -+n -+m ./threadlist

./iozone -i1 -c -e -w -r 8M -s 128G -t $Threads -+n -+m ./threadlist