White Papers
Figure 5: Resnet50 inference performance on V100 vs P100.
Conclusions and Future Work
After evaluating the performance of V100 with three popular deep learning frameworks, we conclude that in training V100 is more than
40% faster than P100 in FP32 and more than 100% faster in FP16, and in inference V100 is 3.7x faster than P100. This demonstrates
the performance benefits when the V100 tensor cores are used. In the future work, we will evaluate different data type combinations in
FP16 and study the accuracy impact with FP16 in deep learning training. We will also evaluate the TensorFlow with FP16 once support
is added into the software. Finally, we plan to scale the training to multiple nodes with these frameworks.
1530
5623
6.54
6.94
0.00
5.00
10.00
15.00
20.00
25.00
30.00
0
1000
2000
3000
4000
5000
6000
P100 V100
Latency (ms)
Images/sec
3.7x faster inference on V100 vs P100
images/sec latency