White Papers

Dell - Internal Use - Confidential
Inference with TensorRTfrom GTC 2017. We used ILSVRC2012 validation dataset for both calibration and
benchmarking. The validation dataset has 50,000 images and was divided into batches where each batch
has 25 images. The first 50 batches were used for calibration purpose and the rest of the images were
used for accuracy measurement. Several pre-trained neural network models were used in our
experiments, including ResNet-50, ResNet-101, ResNet-152, VGG-16, VGG-19, GoogLeNet and AlexNet.
Both top-1 and top-5 accuracies were recorded using FP32 and INT8 and the accuracy difference between
FP32 and INT8 was calculated. The result is shown in Table 3. From this table, we can see the accuracy
difference between FP32 and INT8 is between 0.02% - 0.18% which means very minimum accuracy loss is
achieved, while 3x speed up can be achieved.
Table 3: The accuracy comparison between FP32 and INT8
Network
FP32
INT8
Difference
Top-1
Top-5
Top-1
Top-5
Top-1
Top-5
ResNet-50
72.90%
91.14%
72.84%
91.08%
0.07%
0.06%
ResNet-101
74.33%
91.95%
74.31%
91.88%
0.02%
0.07%
ResNet-152
74.90%
92.21%
74.84%
92.16%
0.06%
0.05%
VGG-16
68.35%
88.45%
68.30%
88.42%
0.05%
0.03%
VGG-19
68.47%
88.46%
68.38%
88.42%
0.09%
0.03%
GoogLeNet
68.95%
89.12%
68.77%
89.00%
0.18%
0.12%
AlexNet
56.82%
79.99%
56.79%
79.94%
0.03%
0.06%
Conclusions
In this blog, we compared the inference performance on both P40 and P4 GPUs in the latest Dell EMC
PowerEdge R740 server and concluded that P40 has ~2x higher inference performance compared to P4.
But P4 is more power efficient and the performance/watt is ~1.5x than P40. Also with NVIDIA TensorRT
library, INT8 can achieve comparable accuracy compared to FP32 while outperforming it with 3x in terms
of performance.