Administrator Guide

Performance characterization
22 Dell EMC Ready Solution for HPC PixStor Storage | Document ID
The following commands were used to execute the benchmark for writes and reads, where Threads was the
variable with the number of threads used (1 to 1024 incremented in powers of two), and my_hosts.$Threads
is the corresponding file that allocated each thread on a different node, using round robin to spread them
homogeneously across the 16 compute nodes.
mpirun --allow-run-as-root -np $Threads --hostfile my_hosts.$Threads --mca
btl_openib_allow_ib 1 --mca pml ^ucx --oversubscribe --prefix
/mmfs1/perftest/ompi /mmfs1/perftest/lanl_ior/bin/ior -a POSIX -v -i 1 -d 3 -e -
k -o /mmfs1/perftest/tst.file -w -s 1 -t 8m -b 128G
mpirun --allow-run-as-root -np $Threads --hostfile my_hosts.$Threads --mca
btl_openib_allow_ib 1 --mca pml ^ucx --oversubscribe --prefix
/mmfs1/perftest/ompi /mmfs1/perftest/lanl_ior/bin/ior -a POSIX -v -i 1 -d 3 -e -
k -o /mmfs1/perftest/tst.file -r -s 1 -t 8m -b 128G
Figure 10 N to 1 Sequential Performance
From the results we can observe that performance rises again very fast with the number of clients used and
then reaches a plateau that is semi-stable for reads and very stable for writes all the way to the maximum
number of threads used on this test. Therefore, large single shared file sequential performance is stable even
for 1024 concurrent clients. Notice that the maximum read performance was 23.7 GB/s at 16 threads and very
likely the bottleneck was the InfiniBand EDR interface, with ME4 arrays still had some extra performance
available. Furthermore, read performance decreased from that value until reaching the plateau at around 20.5
GB/s, with a momentary decrease to 18.5 GB/s at 128 threads. Similarly, notice that the maximum write
performance of 16.5 was reached at 16 threads and it is apparently low compared to the ME4 arrays specs.
3.7
7.3
14.8
16.3
16.5
16.4
16.3
16.4
16.3
16.2
16.2
5.7
11.0
17.9
22.8
23.8
22.1
21.3
18.5
20.5
20.8
20.5
0
5
10
15
20
25
1 2 4 8 16 32 64 128 256 512 1024
Throughput in GB/s
Number of concurrent threads
N to 1 Sequential I/O performance
PixStor Write
PixStor Read