White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

7.2.5 PowerEdge C4140-M Multi Node Training vs Non-Dell EMC 8x V100-16GB-SXM2

Figure 33. Training with PowerEdge C4140-M-V100-16GB-SXM2 (8 GPUs) – multi-node versus

Non-Dell EMC SN_8x-V100-16GB-SXM2

In the Figure 33 above we can appreciate the throughput improvement when using a sever with

a higher capacity CPU; as seen in the table below, almost all the models trained with C4140-M-

V100-16GB-SXM2 - CPU IntelXeon6148 (8 GPUs) – multi-node performed better than SN-8xV100.

The exception was AlexNet which still performed under SN_8xV100; however, it improved its

throughput significantly compared when trained with the server with C4140-K-V100-16GB-SXM2

- IntelXeon4116. See the summary in the below table

SN_8X V100_16GB- SXM2

MN- PowerEdge C4140-

M-V100-SXM2 16GB

% Diff

Inception-v4

1606

1993

19%

VGG-19

2449

3205

24%

VGG-16

2762

3734

26%

Inception-v3

3077

3685

16%

ResNet-50

4852

5904

18%

GoogLeNet

7894

10801

27%

AlexNet

16977

14969

-13%

Table 6: Table 5: 8x GPU Comparison between PowerEdge C4140-M multi-node and 8X SXM2