White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Dell - Internal Use - Confidential

Inference with TensorRT” from GTC 2017. We used ILSVRC2012 validation dataset for both calibration and

benchmarking. The validation dataset has 50,000 images and was divided into batches where each batch

has 25 images. The first 50 batches were used for calibration purpose and the rest of the images were

used for accuracy measurement. Several pre-trained neural network models were used in our

experiments, including ResNet-50, ResNet-101, ResNet-152, VGG-16, VGG-19, GoogLeNet and AlexNet.

Both top-1 and top-5 accuracies were recorded using FP32 and INT8 and the accuracy difference between

FP32 and INT8 was calculated. The result is shown in Table 3. From this table, we can see the accuracy

difference between FP32 and INT8 is between 0.02% - 0.18% which means very minimum accuracy loss is

achieved, while 3x speed up can be achieved.

Table 3: The accuracy comparison between FP32 and INT8

Network

FP32

INT8

Difference

Top-1

Top-5

Top-1

Top-5

Top-1

Top-5

ResNet-50

72.90%

91.14%

72.84%

91.08%

0.07%

0.06%

ResNet-101

74.33%

91.95%

74.31%

91.88%

0.02%

0.07%

ResNet-152

74.90%

92.21%

74.84%

92.16%

0.06%

0.05%

VGG-16

68.35%

88.45%

68.30%

88.42%

0.05%

0.03%

VGG-19

68.47%

88.46%

68.38%

88.42%

0.09%

0.03%

GoogLeNet

68.95%

89.12%

68.77%

89.00%

0.18%

0.12%

AlexNet

56.82%

79.99%

56.79%

79.94%

0.03%

0.06%

Conclusions

In this blog, we compared the inference performance on both P40 and P4 GPUs in the latest Dell EMC

PowerEdge R740 server and concluded that P40 has ~2x higher inference performance compared to P4.

But P4 is more power efficient and the performance/watt is ~1.5x than P40. Also with NVIDIA TensorRT

library, INT8 can achieve comparable accuracy compared to FP32 while outperforming it with 3x in terms

of performance.