Administrator Guide

Performance characterization
23 Dell EMC Ready Solution for HPC PixStor Storage | Document ID
Random small blocks IOzone Performance N clients to N files
Random N clients to N files performance was measured with IOzone version 3.487. Tests executed varied
from single thread up to 1024 threads. This benchmark tests used 4 KiB blocks for emulating small blocks
traffic.
Caching effects were minimized by setting the GPFS page pool tunable to 16GiB and using files two times
that size. The first performance test section has a more complete explanation about why this is effective on
GPFS.
The following command was used to execute the benchmark in random IO mode for both writes and reads,
where Threads was the variable with the number of threads used (1 to 1024 incremented in powers of two),
and threadlist was the file that allocated each thread on a different node, using round robin to spread them
homogeneously across the 16 compute nodes.
./iozone -i0 -c -e -w -r 8M -s 32G -t $Threads -+n -+m ./threadlist
./iozone -i2 -c -O -w -r 4K -s 32G -t $Threads -+n -+m ./threadlist
Figure 11 N to N Random Performance
From the results we can observe that write performance starts at a high value of almost 8.2K IOPS and rises
steadily up to 128 threads where it reaches a plateau and remains close to the maximum value of 16.2K
IOPs. Read performance on the other hand starts very small at over 200 IOPS and increases performance
almost linearly with the number of clients used (keep in mind that number of threads is doubled for each data
point) and reaches the maximum performance of 20.4K IOPS at 512 threads without signs of reaching the
maximum. However, using more threads on the current 16 compute nodes with two CPUs each and where
each CPU has 18 cores, have the limitation that there are not enough cores to run the maximum number of
IOzone threads (1024) without incurring in context switching (16 x 2 x 18 = 576 cores), which limits
performance considerably. A future test with more compute nodes could check what random read
performance can be achieved with 1024 threads with IOzone, or IOR could be used to investigate the
behavior with more than 1024 threads.