White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
31
7.2 Throughput images/s – Multi Node
7.2.1 PowerEdge C4130-P100 16GB PCIe- Multi Node
PowerEdge C4130 each with 4 P100-PCIe GPUs were configured in multi-node using InfiniBand
RDMA to run the TensorFlow in distributed mode.
Figure 25: Training with PowerEdge C4130-P100-16GB-PCle in multi-node
PowerEdge C4130 server scales very well within a node with 97% efficiency and 92% across the
nodes. The ideal performance is computed by multiplying the single-GPU throughput by the
number of GPUs in the system. See Figure 26.