White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Figure 4: Performance of V100 vs P100 with TensorFlow

Table 4: Improvement of V100 compared to P100

Batch Size

V100 vs P100 (%)

PCIe

FP32

NV-Caffe

42.23%

MXNet

44.67%

TensorFlow

31.60%

FP16

NV-Caffe

110.68%

MXNet

126.26%

SXM2

FP32

NV-Caffe

40.72%

MXNet

49.76%

TensorFlow

33.14%

FP16

NV-Caffe

100.56%

MXNet

114.33%

Since V100 supports both deep learning training and inference, we also tested the inference performance with V100 using the latest

TensorRT 3.0.0. The testing was done in FP16 mode on both V100-SXM2 and P100-PCIe and the result is shown in Figure 5. We used

batch size 39 for V100 and 10 for P100. Different batches were chosen to make their inference latencies are close to each other (~7ms

in the figure). The result shows that when their latencies are close, the inference throughput of V100 is 3.7x faster compared to P100.

811

1082

785

1065

200

400

600

800

1000

1200

P100 V100 P100 V100

PCIe(config G) SXM2(config K)

Images/sec

4 GPUs

TensorFlow Resnet50