Administrator Guide

ManualsBrandsDell ManualsConverged InfrastructureEnterprise Solution Resources

5 Deep Learning Performance Scale-Out

GPU’s

▪ 1-8

Performance Metrics

▪ Throughput images/second

▪ Training to convergence at 76.2% TOP-1 ACCURACY

Dataset

▪ ILSVRC2012 - ImageNet

Environment

▪ Docker

Table 1: Benchmark Setup

Software Stack

The Table shows the software stack configuration used to build the environment to run the tests

shown in paper

[0]

and the current tests

Software Stack

Previous Tests

Current Tests

Test Date

February 2019

January 2020

Ubuntu 16.04.4 LTS

Ubuntu 18.04.3 LTS

Kernel

GNU/Linux 4.4.0-128-generic

x86_64

GNU/Linux 4.15.0-69-generic

x86_64

nvidia driver

396.26

440.33.01

CUDA

9.1.85

10.0

cuDNN

7.1.3

7.6.5

NCCL

2.2.15

2.5.6

TensorFlow

1.10

1.14

Horovod

0.15.2

0.19.0

Python

2.7

Open MPI

3.0.1

4.0.0

Mellanox OFED

4.3-1

4.7-3

GPUDirect RDMA

1.0-7

1.0-8

Single Node -

Docker Container

TensorFlow/tensorflow:nightly-

gpu-py3

nvidia/cuda:10.0-devel-

ubuntu18.04

Multi Node -

Docker Container built

from

nvidia/cuda:9.1-devel-

ubuntu16.04

nvidia/cuda:10.0-devel-

ubuntu18.04

Benchmark scripts

tf_cnn_benchmarks

Table 2: OS & Driver Configurations

Distributed Setup

The tests were run in a docker environment. Error! Reference source not found. 1 below shows

the different logical layers involved in the software stack configuration. Each server is connected

to the InfiniBand switch; has installed on the Host the Mellanox OFED for Ubuntu, the Docker CE,

and the GPUDirect RDMA API; and the container image that was built with Horovod and Mellanox

OFED among other supporting libraries. To build the extended container image, we used the

Horovod docker file and modified it by adding the installation for Mellanox OFED drivers [2]. It

was built from nvidia/cuda:10.0-devel-ubuntu18.04