Administrator Guide
5 Deep Learning Performance Scale-Out
GPU’s
▪ 1-8
Performance Metrics
▪ Throughput images/second
▪ Training to convergence at 76.2% TOP-1 ACCURACY
Dataset
▪ ILSVRC2012 - ImageNet
Environment
▪ Docker
Table 1: Benchmark Setup
Software Stack
The Table shows the software stack configuration used to build the environment to run the tests
shown in paper
[0]
and the current tests
Software Stack
Previous Tests
Current Tests
Test Date
February 2019
January 2020
OS
Ubuntu 16.04.4 LTS
Ubuntu 18.04.3 LTS
Kernel
GNU/Linux 4.4.0-128-generic
x86_64
GNU/Linux 4.15.0-69-generic
x86_64
nvidia driver
396.26
440.33.01
CUDA
9.1.85
10.0
cuDNN
7.1.3
7.6.5
NCCL
2.2.15
2.5.6
TensorFlow
1.10
1.14
Horovod
0.15.2
0.19.0
Python
2.7
2.7
Open MPI
3.0.1
4.0.0
Mellanox OFED
4.3-1
4.7-3
GPUDirect RDMA
1.0-7
1.0-8
Single Node -
Docker Container
TensorFlow/tensorflow:nightly-
gpu-py3
nvidia/cuda:10.0-devel-
ubuntu18.04
Multi Node -
Docker Container built
from
nvidia/cuda:9.1-devel-
ubuntu16.04
nvidia/cuda:10.0-devel-
ubuntu18.04
Benchmark scripts
tf_cnn_benchmarks
tf_cnn_benchmarks
Table 2: OS & Driver Configurations
Distributed Setup
The tests were run in a docker environment. Error! Reference source not found. 1 below shows
the different logical layers involved in the software stack configuration. Each server is connected
to the InfiniBand switch; has installed on the Host the Mellanox OFED for Ubuntu, the Docker CE,
and the GPUDirect RDMA API; and the container image that was built with Horovod and Mellanox
OFED among other supporting libraries. To build the extended container image, we used the
Horovod docker file and modified it by adding the installation for Mellanox OFED drivers [2]. It
was built from nvidia/cuda:10.0-devel-ubuntu18.04