White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
39
7.2.6 PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB-
SXM2
In the results shown in the Figure 31 and Figure 33 we configured the multi-node system with
servers PowerEdge C4140-V100-SXM2- Configuration-K Intel Xeon4116 CPU and Configuration-
M Intel Xeon6148 CPU respectively, versus single-node training non-Dell EMC 8xV100-16GB-
SXM2.
To show the impact of the CPU in the training of deep learning workloads, we run additional tests
configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M
and Intel Xeon6148 CPU. In the Figure 34 we see how advance CPU models boost even more the
gpu performance, since most of the data loading, data preprocessing, and batch transformation
tasks occur at the CPU level, whereas the training tasks occur at the gpu level.
Figure 34 . Multi-node training PowerEdge C4140-V100-SXM2- Configuration-K with IntelXeon4116 cpu,
Multi-node training PowerEdge C4140-V100-SXM2 Configuration-M with IntelXeon6148 cpu, versus
single-node training non Dell 8xV100-16GB-SXM2