Administrator Guide

Performance characterization

23 Dell EMC Ready Solution for HPC PixStor Storage | Document ID

Random small blocks IOzone Performance N clients to N files

Random N clients to N files performance was measured with IOzone version 3.487. Tests executed varied

from single thread up to 1024 threads. This benchmark tests used 4 KiB blocks for emulating small blocks

traffic.

Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files two times

that size. The first performance test section has a more complete explanation about why this is effective on

GPFS.

The following command was used to execute the benchmark in random IO mode for both writes and reads,

where Threads was the variable with the number of threads used (1 to 1024 incremented in powers of two),

and threadlist was the file that allocated each thread on a different node, using round robin to spread them

homogeneously across the 16 compute nodes.

./iozone -i0 -c -e -w -r 8M -s 32G -t $Threads -+n -+m ./threadlist

./iozone -i2 -c -O -w -r 4K -s 32G -t $Threads -+n -+m ./threadlist

Figure 11 N to N Random Performance

From the results we can observe that write performance starts at a high value of almost 8.2K IOPS and rises

steadily up to 128 threads where it reaches a plateau and remains close to the maximum value of 16.2K

IOPs. Read performance on the other hand starts very small at over 200 IOPS and increases performance

almost linearly with the number of clients used (keep in mind that number of threads is doubled for each data

point) and reaches the maximum performance of 20.4K IOPS at 512 threads without signs of reaching the

maximum. However, using more threads on the current 16 compute nodes with two CPUs each and where

each CPU has 18 cores, have the limitation that there are not enough cores to run the maximum number of

IOzone threads (1024) without incurring in context switching (16 x 2 x 18 = 576 cores), which limits

performance considerably. A future test with more compute nodes could check what random read

performance can be achieved with 1024 threads with IOzone, or IOR could be used to investigate the

behavior with more than 1024 threads.