White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

7.2 Throughput images/s – Multi Node

7.2.1 PowerEdge C4130-P100 16GB PCIe- Multi Node

PowerEdge C4130 each with 4 P100-PCIe GPUs were configured in multi-node using InfiniBand

RDMA to run the TensorFlow in distributed mode.

Figure 25: Training with PowerEdge C4130-P100-16GB-PCle in multi-node

PowerEdge C4130 server scales very well within a node with 97% efficiency and 92% across the

nodes. The ideal performance is computed by multiplying the single-GPU throughput by the

number of GPUs in the system. See Figure 26.