White Papers

36 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

Figure 15. Throughput Native TensorFlow FP32 versus TF-TRT 5.0 Integration INT8

Figure 16 shows the latency curve for each inference configuration, the lower is the latency

better is the performance, and in this case TF-TRT-INT8 implementation reached the lowest

inference time for all the batch sizes.