White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

7.2.2 PowerEdge C4140-K-V100-16GB and V100-32GB: SXM2 Multi Node

Figure 27: Training with PowerEdge C4140-V100-16&32GB-SXM2 in multi-node

PowerEdge C4140-V100-16GB-SXM2 and PowerEdge C4140-V100-32GB-SXM2 with 4 GPUs each

were configured in multi-node to run the TensorFlow in distributed mode, extract the throughput

performance, and determine its scaling efficiency. The GPUs scale very well within a node to 97%

and 90% across the nodes. The ideal performance is computed by multiplying the single-GPU

throughput by the number of GPUs in the system. See Figure 28