White Papers
Table 1: Hardware configuration and software details
Platform
PowerEdge C4130 (configuration G)
Processor
2 x Intel Xeon CPU E5-2690 v4 @2.6GHz (Broadwell)
Memory
256GB DDR4 @ 2400MHz
Disk
400GB SSD
GPU
4x Tesla P40 with 24GB GPU memory
Software and Firmware
Operating System
Ubuntu 14.04
BIOS
2.3.3
CUDA and driver version
8.0.44 (375.20)
TensorRT Version
2.0 EA
Table 2: Comparison between Tesla M40 and P40
Tesla M40
Tesla P40
INT8 (TIOP/s)
N/A
47.0
FP32 (TFLOP/s)
6.8
11.8
Performance Evaluation
In this section, we will present the inference performance with TensorRT on GoogLeNet and AlexNet. We
also implemented the benchmark with MPI so that it can be run on multiple P40 GPUs within a node. We
will also compare the performance of P40 with M40. Lastly we will show the performance impact when
using different batch sizes.
Figure 1 shows the inference performance with TensorRT library for both GoogLeNet and AlexNet. We
can see that INT8 mode is ~3x faster than FP32 in both neural networks. This is expected since the
theoretical speedup of INT8 is 4x compared to FP32 if only multiplications are performed and no other
overhead is incurred. However, there are kernel launches, occupancy limits, data movement and math
other than multiplications, so the speedup is reduced to about 3x faster.