Administrator Guide

12 Deep Learning Performance Scale-Out
Figure 10: Multi Node PowerEdge C4140-M. ResNet-50 BS 256 TF 1.10 vs TF 1.14 vs TF 1.14 + XLA
ResNet-50 with TF 1.14 + XLA + GPUDirect RDMA
Another feature explored in our previous paper was GPUDirect RDMA which provides a direct
P2P (Peer-to-Peer) data path between GPU memory using a Mellanox HCA device between the
nodes. In this test, we enabled it by adding the NCCL flag x NCCL_NET_GDR_LEVEL=3 at the
script level (this variable replaced the variable NCCL_IB_CUDA_SUPPORT in NCCL v 2.4.0).
NCCL_NET_GDR_LEVEL variable allows you to control when to use GPUDirect RDMA between
a NIC and a GPU. Example level 3 indicates to use GPUDirect RDMA when GPU and NIC are
on the same PCI root complex [4].