White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
33
7.2.2 PowerEdge C4140-K-V100-16GB and V100-32GB: SXM2 Multi Node
Figure 27: Training with PowerEdge C4140-V100-16&32GB-SXM2 in multi-node
PowerEdge C4140-V100-16GB-SXM2 and PowerEdge C4140-V100-32GB-SXM2 with 4 GPUs each
were configured in multi-node to run the TensorFlow in distributed mode, extract the throughput
performance, and determine its scaling efficiency. The GPUs scale very well within a node to 97%
and 90% across the nodes. The ideal performance is computed by multiplying the single-GPU
throughput by the number of GPUs in the system. See Figure 28