White Papers
Dell - Internal Use - Confidential
BIOS firmware
1.1.0
iDrac firmware
2.02.01.01
Compilers
Intel C++ Composer XE 2015
MPI
Intel MPI 5.0.1
MKL
Intel MKL 11.2
High Performance Linpack
Version 2.1
Figure 2: HPL acceleration (FLOPS compared to CPU only) and efficiency on the C4130 configurations.
Figure 2 illustrates the HPL performance on the PowerEdge C4130 Server. The Offload execution mode was used
for all the runs. In this mode the application splits the workload where highly-parallel code is offloaded to the
coprocessor, and the Xeon host processors primarily run serial code. Configuration C has 2 Phi’s connected to
each CPU and configuration D has single Phi connected to each CPU. ECC is enabled and the turbo mode is
disabled across all the runs.
Intel Xeon Phi coprocessor provides more efficient performance for highly parallel applications like HPL. In the
above graphs the CPU only performance is shown for reference. The compute efficiency for CPU only
configuration is 91.6% whereas Configuration C has a compute efficiency of 75.6% and configuration D has 81.2%.
It is already known that the CPU-only configurations in general have higher efficiency when compared to CPU plus
Phi configurations. Higher efficiency is observed in configuration D compared C. Compared to the CPU only