White Papers

6 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
TensorFlow™ is an open source software library for high performance numerical
computation. Its flexible architecture allows easy deployment of computation across a variety
of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and
edge devices. Originally developed by researchers and engineers from the Google Brain
team within Google’s AI organization, it comes with strong support for machine learning and
deep learning libraries and the flexible numerical computation core is used across many other
scientific domains.
Transfer Learning
Transfer Learning is a technique that shortcuts the training process by taking portion of a
model and reusing it in a new neural model. The pre-trained model is used to initialize a
training process and start from there. Transfer Learning is useful when training on small
datasets.
TensorRT™
Nvidia TensorRT™ is a high-performance deep learning inference and run-time optimizer
delivering low latency and high throughput for production model deployment. TensorRT™
has been successfully used in a wide range of applications including autonomous vehicles,
robotics, video analytics, automatic speech recognition among others. TensorRT™ supports
Turing Tensor Cores and expands the set of neural network optimizations for multi-precision
workloads. With the TensorRT 5, DL applications can be optimized and calibrated for lower
precision with high throughout and accuracy for production deployment.
Figure 2:TensorRT™ scheme. Source: Nvidia
In Figure 2 we present the general scheme of how TensorRT™ works. TensorRT™
optimizes an already trained neural network by combining layers, fusing tensors, and
optimizing kernel selection for improved latency, throughput, power efficiency and memory
consumption. It also optimizes the network and generate runtime engines in lower precision
to increase performance.
CheXNet