White Papers

ManualsBrandsDell ManualsConverged InfrastructureServers Solution Resources

38 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

See the Table 5 with the consolidated results of the CheXNet Inference in Native TensorFlow

FP32 mode versus TF-TRT 5.0 Integration INT8int8, in terms of throughput and latency. We

observed the huge different when running the test in different configurations. For speedup

factors see the next tables.

Table 5. Throughput and Latency Native TensorFlow FP32 versus TF-TRT 5.0 Integration INT8

Batch

Size

TF-TRT INT8

Native TensorFlow FP32-GPU

Native TensorFlow FP32- CPU Only

Throughput

(img/sec)

Latency

(ms)

Throughput

(img/sec)

Latency

(ms)

Throughput

(img/sec)

Latency

(ms)

315

142

115

544

198

195

901

251

292

1281

284

431

1456

307

755

1549

329

1356

In Table 6 we have calculated the speedup factor of TF-TRT 5.0 Integration INT8 versus

Native TensorFlow FP32-GPU. The server PowerEdge R7425-T4 performed in average up

to 4X faster than native TensorFlow-GPU when accelerating the workloads with TF-TRT

Integration.

Table 6. PowerEdge R7425-T4 Speedup Factor with TF-TRT versus native TensorFlow-GPU

Batch Size

TF-TRT INT8

Native TensorFlow FP32-

GPU

Speedup Factor X

Throughput (img/sec)

315

142

544

198

901

251

1281

284

1456

307

1549

329

Average

In Table 7 we have calculated the speedup factor of TF-TRT 5.0 Integration INT8 versus

Native TensorFlow FP32-CPU Only. The server PowerEdge R7425-T4 performed in average

up to 58X faster than native TensorFlow-CPU Only when accelerating the workloads with

TF-TRT Integration

Table 7. PowerEdge R7425-T4 Speedup Factor with TF-TRT versus native TensorFlow-CPU Only

Batch Size

TF-TRT INT8

Native TensorFlow FP32-

CPU Only

Speedup Factor X

Throughput (img/sec)

315

35X

544

51X

901

63X

1281

67X

1456

66X

1549

63X

Average

58X