White Papers
34 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
• Throughput (images/sec): ~315
• Latency (sec): 0.00704*1000 = ~3
5.4 Benchmarking CheXNet Model Inference with Official ResnetV2_50
To benchmark our custom model CheXNet with a well-known model, we replicated the same
inference tests TF-TRT-INT8 Integration using the official pre-trained version of the ResNet-
50 v2 model (fp32, Accuracy 76.47%) [6]. The model was downloaded as SavedModel format
produced with Estimator during the training in FP32 precision mode, this version also accepts
input tensors with channel first format (CHW). See the TensorFlow performance guide for
more details[18].
Figure 13. Throughput CheXNet TF-TRT-INT8int8 versus ResnetV2_50 TF-TRT-INT8int8 Inference
In the Figure 13 we can appreciate that our custom model CheXNet and the official model
ResnetV2_50 performed closely when running optimized inferences with TF-TRT INT8int8
integration. It is a good practice to benchmark our custom models with official models, so we
can decide whether going back and retrain it or move forward with the optimized model.
We see also in Figure 14 that the latency of both models was similar too across
different batch sizes. Lower latency is better, mainly for critical real time applications where
milliseconds matter.