White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
38
7.2.5 PowerEdge C4140-M Multi Node Training vs Non-Dell EMC 8x V100-16GB-SXM2
Figure 33. Training with PowerEdge C4140-M-V100-16GB-SXM2 (8 GPUs) multi-node versus
Non-Dell EMC SN_8x-V100-16GB-SXM2
In the Figure 33 above we can appreciate the throughput improvement when using a sever with
a higher capacity CPU; as seen in the table below, almost all the models trained with C4140-M-
V100-16GB-SXM2 - CPU IntelXeon6148 (8 GPUs) multi-node performed better than SN-8xV100.
The exception was AlexNet which still performed under SN_8xV100; however, it improved its
throughput significantly compared when trained with the server with C4140-K-V100-16GB-SXM2
- IntelXeon4116. See the summary in the below table
SN_8X V100_16GB- SXM2
MN- PowerEdge C4140-
M-V100-SXM2 16GB
% Diff
Inception-v4
1606
1993
19%
VGG-19
2449
3205
24%
VGG-16
2762
3734
26%
Inception-v3
3077
3685
16%
ResNet-50
4852
5904
18%
GoogLeNet
7894
10801
27%
AlexNet
16977
14969
-13%
Table 6: Table 5: 8x GPU Comparison between PowerEdge C4140-M multi-node and 8X SXM2