White Papers

34 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

• Throughput (images/sec): ~315

• Latency (sec): 0.00704*1000 = ~3

5.4 Benchmarking CheXNet Model Inference with Official ResnetV2_50

To benchmark our custom model CheXNet with a well-known model, we replicated the same

inference tests TF-TRT-INT8 Integration using the official pre-trained version of the ResNet-

50 v2 model (fp32, Accuracy 76.47%) [6]. The model was downloaded as SavedModel format

produced with Estimator during the training in FP32 precision mode, this version also accepts

input tensors with channel first format (CHW). See the TensorFlow performance guide for

more details[18].

Figure 13. Throughput CheXNet TF-TRT-INT8int8 versus ResnetV2_50 TF-TRT-INT8int8 Inference

In the Figure 13 we can appreciate that our custom model CheXNet and the official model

ResnetV2_50 performed closely when running optimized inferences with TF-TRT INT8int8

integration. It is a good practice to benchmark our custom models with official models, so we

can decide whether going back and retrain it or move forward with the optimized model.

We see also in Figure 14 that the latency of both models was similar too across

different batch sizes. Lower latency is better, mainly for critical real time applications where

milliseconds matter.