White Papers
Executive summary
4 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425
Executive summary
The Healthcare industry has been one of the leading-edge industries to adopt techniques
related to machine learning and deep learning to improve diagnosis, provide higher level of
accuracies in term of detection and reduce overall cost related to mis-diagnosis. Deep
Learning consists of two phases: training and inference. Training involves learning a neural
network model from a given training dataset over a certain number of training iterations and
loss function. The output of this phase, the learned model, is then used in the inference phase
to speculate on new data. For the training phase, we leveraged the CheXNet model
developed by Stanford University ML Group to detect pneumonia which outperformed a
panel of radiologists [1]. We used National Institutes of Health (NIH) Chest X-ray dataset
which consist of 112,120 images labeled with 14 different thoracic diseases including
pneumonia. All images are labeled with either single or multiple pathologies, making it a
multi-label classification problem. Images in the Chest X-ray dataset are 3 channel (RGB)
with dimensions 1024x1024.
We trained CheXNet model on (NIH) Chest X-ray dataset using Dell EMC PowerEdge C4140
with NVIDIA V100-SXM2 GPU server. For inference we used Nvidia TensorRT™, a high-
performance deep learning inference optimizer and runtime that delivers low latency and
high-throughput. In this project, we have used the CheXNet model as reference to train a
custom model from scratch and classify 14 different thoracic deceases, and the TensorRT™
tool to optimize the model and accelerate its inference.
The objective is to show how PowerEdge R7425 can be used as a scale-up inferencing
server to run production-level deep learning inference workloads. Here we will show how to
train a CheXNet model and run optimized inference with Nvidia TensorRT™ on Dell EMC
PowerEdge R7425 server.
The topics explained here are presented from development perspective, explaining the
different TensorRT™ implementation tools at the coding level to optimize the inference
CheXNet model. During the tests, we ran inference workloads on PowerEdge R7425 with
several configurations. TensorFlow was used as the primary framework to train the model
and run the inferences, the performance was measured in terms of throughput (images/sec)
and latency (milliseconds).