White Papers

Figure 5: Resnet50 inference performance on V100 vs P100.

Conclusions and Future Work

After evaluating the performance of V100 with three popular deep learning frameworks, we conclude that in training V100 is more than

40% faster than P100 in FP32 and more than 100% faster in FP16, and in inference V100 is 3.7x faster than P100. This demonstrates

the performance benefits when the V100 tensor cores are used. In the future work, we will evaluate different data type combinations in

FP16 and study the accuracy impact with FP16 in deep learning training. We will also evaluate the TensorFlow with FP16 once support

is added into the software. Finally, we plan to scale the training to multiple nodes with these frameworks.

1530

5623

6.54

6.94

0.00

5.00

10.00

15.00

20.00

25.00

30.00

1000

2000

3000

4000

5000

6000

P100 V100

Latency (ms)

Images/sec

3.7x faster inference on V100 vs P100

images/sec latency