White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

As in our previous deep learning blog, we still use the three most popular deep learning frameworks: NVIDIA’s fork of Caffe (NV-Caffe),

MXNet and TensorFlow. Both NV-Caffe and MXNet have been optimized for V100. TensorFlow still does not have any official release to

support V100, but we applied some patches obtained from TensorFlow developers so that it is also optimized for V100 in these tests. For

the dataset, we still use ILSVRC 2012 dataset whose training set contains 1281167 training images and 50000 validation images. When

testing neural network, we chose Resnet50 as it is a computationally intensive network. To get best performance, we used CUDA 9-rc

compiler and CUDNN library in all of the three frameworks since they are optimized for V100. The testing platform is Dell EMC’s

PowerEdge C4130 server. The C4130 server has multiple configurations, and we evaluated both P100-PCIe in configuration G and P100-

SXM2 in configuration K. The difference between configuration G and configuration K is shown in Figure 1. There are mainly two

differences: one is that configuration G has two x16 PCIe links connecting from dual CPUs to the four GPUs, while configuration K has

only one x16 PCIe bus connecting from one CPU to four GPUs; another difference is that GPUs are connected by PCIe buses in

configuration G but by NVLink in configuration K. The other hardware and software details are shown in Table 2.

Figure 1: Comparison between configure G and configure K

Table 2: The hardware configuration and software details

Platform

PowerEdge C4130 (config G and config K)

CPU

2 x Intel Xeon E5-2690 v4 @2.6GHz (Broadwell)

Memory

256GB DDR4 @ 2400MHz

Disk

9TB HDD

GPU

V100-PCIe, V100-SXM2, P100-PCIe, P100-SXM2

Software and Firmware

Operating System

RHEL 7.3 x86_64

Linux Kernel

3.10.0-514.26.2.el7.x86_64

BIOS

2.4.2

CUDA compiler and GPU driver

CUDA 9.0-RC (384.59)

NCCL

2.0

Python

2.7.5

Deep Learning Libraries and Frameworks

CUDNN

7.0

TensorRT

3.0.0

NV-Caffe

0.16.3

MXNet

0.11.0

TensorFlow

1.2.1-rc1

x16

2 CPU / 4 GPU

2 Virtual Switches

2 GPU per CPU

PCIe Gen3 96-lane Switch

GPU1 GPU2

Slot

CPU1

Slot

x16 x8 x16 x16 x8

X X

GPU3 GPU4

CPU2