White Papers
Figure 4: Performance of V100 vs P100 with TensorFlow
Table 4: Improvement of V100 compared to P100
Batch Size
V100 vs P100 (%)
PCIe
FP32
NV-Caffe
42.23%
MXNet
44.67%
TensorFlow
31.60%
FP16
NV-Caffe
110.68%
MXNet
126.26%
SXM2
FP32
NV-Caffe
40.72%
MXNet
49.76%
TensorFlow
33.14%
FP16
NV-Caffe
100.56%
MXNet
114.33%
Since V100 supports both deep learning training and inference, we also tested the inference performance with V100 using the latest
TensorRT 3.0.0. The testing was done in FP16 mode on both V100-SXM2 and P100-PCIe and the result is shown in Figure 5. We used
batch size 39 for V100 and 10 for P100. Different batches were chosen to make their inference latencies are close to each other (~7ms
in the figure). The result shows that when their latencies are close, the inference throughput of V100 is 3.7x faster compared to P100.
811
1082
785
1065
0
200
400
600
800
1000
1200
P100 V100 P100 V100
PCIe(config G) SXM2(config K)
Images/sec
4 GPUs
TensorFlow Resnet50