White Papers

Figure 1: Inference performance with TensorRT library
Dell’s PowerEdge C4130 supports up to 4 GPUs in a server. To make use of all GPUs, we implemented the
inference benchmark using MPI so that each MPI process runs on each GPU. Figure 2 and Figure 3 show
the multi-GPU inference performance on GoogLeNet and AlexNet, respectively. When using multiple
GPUs, linear speedup were achieved for both neural networks. This is because each GPU processes its
own images and there is no communications and synchronizations among used GPUs.
Figure 2: Multi-GPU inference performance with TensorRT GoogLeNet
2215
5198
6410
16292
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
GoogLeNet AlexNet
Images/sec (higher is better)
TensorRT Inference Performance
(batch size=128)
FP32 INT8
6406
12808
25592
1
1.999
3.995
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0
5000
10000
15000
20000
25000
30000
1 P40 2 P40 4 P40
Speedup
Images/sec (higher is better)
Number of GPUs
TensorRT GoogLeNet on Multi-GPU
(INT8, batch size=128)