White Papers

6 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

TensorFlow™ is an open source software library for high performance numerical

computation. Its flexible architecture allows easy deployment of computation across a variety

of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and

edge devices. Originally developed by researchers and engineers from the Google Brain

team within Google’s AI organization, it comes with strong support for machine learning and

deep learning libraries and the flexible numerical computation core is used across many other

scientific domains.

Transfer Learning

Transfer Learning is a technique that shortcuts the training process by taking portion of a

model and reusing it in a new neural model. The pre-trained model is used to initialize a

training process and start from there. Transfer Learning is useful when training on small

datasets.

TensorRT™

Nvidia TensorRT™ is a high-performance deep learning inference and run-time optimizer

delivering low latency and high throughput for production model deployment. TensorRT™

has been successfully used in a wide range of applications including autonomous vehicles,

robotics, video analytics, automatic speech recognition among others. TensorRT™ supports

Turing Tensor Cores and expands the set of neural network optimizations for multi-precision

workloads. With the TensorRT™ 5, DL applications can be optimized and calibrated for lower

precision with high throughout and accuracy for production deployment.

Figure 2:TensorRT™ scheme. Source: Nvidia

In Figure 2 we present the general scheme of how TensorRT™ works. TensorRT™

optimizes an already trained neural network by combining layers, fusing tensors, and

optimizing kernel selection for improved latency, throughput, power efficiency and memory

consumption. It also optimizes the network and generate runtime engines in lower precision

to increase performance.

CheXNet