Administrator Guide

13 Deep Learning Performance Scale-Out
Figure 11: Multi Node PowerEdge C4140-M. ResNet-50 with TF 1.14 + XLA + GPUDirect RDMA
Figure 11 shows the results of ResNet-50 with TF 1.14 w/XLA enabled, with and without
GPUDirect RDMA. We did not observe much performance gains using GPUDirect RDMA across
nodes i.e. the performance remained the same and hence we did not explore it further in our
testing. This is not to say that GPUDirect RDMA does not help when using scale-out, all we are
saying is we did not see the performance gains; hence we are not exploring it further in this paper.
ResNet-50’s configuration for better performance
Figure 12 summarizes the results of different configurations explored for the short tests and
based on the tests we found that the best performance was achieved using the combination
below:
ResNet-50 + BS 256 + TF 1.14 + XLA enabled