White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

7.2.6 PowerEdge C4140 Multi Node Training with Different CPU Models vs 8x V100-16GB-

SXM2

In the results shown in the Figure 31 and Figure 33 we configured the multi-node system with

servers PowerEdge C4140-V100-SXM2- Configuration-K Intel Xeon4116 CPU and Configuration-

M Intel Xeon6148 CPU respectively, versus single-node training non-Dell EMC 8xV100-16GB-

SXM2.

To show the impact of the CPU in the training of deep learning workloads, we run additional tests

configuring the multi-node system with servers PowerEdge C4140-V100-SXM2 Configuration-M

and Intel Xeon6148 CPU. In the Figure 34 we see how advance CPU models boost even more the

gpu performance, since most of the data loading, data preprocessing, and batch transformation

tasks occur at the CPU level, whereas the training tasks occur at the gpu level.

Figure 34 . Multi-node training PowerEdge C4140-V100-SXM2- Configuration-K with IntelXeon4116 cpu,

Multi-node training PowerEdge C4140-V100-SXM2 Configuration-M with IntelXeon6148 cpu, versus

single-node training non Dell 8xV100-16GB-SXM2