White Papers

Executive summary

4 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425

Executive summary

The Healthcare industry has been one of the leading-edge industries to adopt techniques

related to machine learning and deep learning to improve diagnosis, provide higher level of

accuracies in term of detection and reduce overall cost related to mis-diagnosis. Deep

Learning consists of two phases: training and inference. Training involves learning a neural

network model from a given training dataset over a certain number of training iterations and

loss function. The output of this phase, the learned model, is then used in the inference phase

to speculate on new data. For the training phase, we leveraged the CheXNet model

developed by Stanford University ML Group to detect pneumonia which outperformed a

panel of radiologists [1]. We used National Institutes of Health (NIH) Chest X-ray dataset

which consist of 112,120 images labeled with 14 different thoracic diseases including

pneumonia. All images are labeled with either single or multiple pathologies, making it a

multi-label classification problem. Images in the Chest X-ray dataset are 3 channel (RGB)

with dimensions 1024x1024.

We trained CheXNet model on (NIH) Chest X-ray dataset using Dell EMC PowerEdge C4140

with NVIDIA V100-SXM2 GPU server. For inference we used Nvidia TensorRT™, a high-

performance deep learning inference optimizer and runtime that delivers low latency and

high-throughput. In this project, we have used the CheXNet model as reference to train a

custom model from scratch and classify 14 different thoracic deceases, and the TensorRT™

tool to optimize the model and accelerate its inference.

The objective is to show how PowerEdge R7425 can be used as a scale-up inferencing

server to run production-level deep learning inference workloads. Here we will show how to

train a CheXNet model and run optimized inference with Nvidia TensorRT™ on Dell EMC

PowerEdge R7425 server.

The topics explained here are presented from development perspective, explaining the

different TensorRT™ implementation tools at the coding level to optimize the inference

CheXNet model. During the tests, we ran inference workloads on PowerEdge R7425 with

several configurations. TensorFlow was used as the primary framework to train the model

and run the inferences, the performance was measured in terms of throughput (images/sec)

and latency (milliseconds).