White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Figure 1: Inference performance with TensorRT library

Dell’s PowerEdge C4130 supports up to 4 GPUs in a server. To make use of all GPUs, we implemented the

inference benchmark using MPI so that each MPI process runs on each GPU. Figure 2 and Figure 3 show

the multi-GPU inference performance on GoogLeNet and AlexNet, respectively. When using multiple

GPUs, linear speedup were achieved for both neural networks. This is because each GPU processes its

own images and there is no communications and synchronizations among used GPUs.

Figure 2: Multi-GPU inference performance with TensorRT GoogLeNet

2215

5198

6410

16292

2000

4000

6000

8000

10000

12000

14000

16000

18000

GoogLeNet AlexNet

Images/sec (higher is better)

TensorRT Inference Performance

(batch size=128)

FP32 INT8

6406

12808

25592

1.999

3.995

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5000

10000

15000

20000

25000

30000

1 P40 2 P40 4 P40

Speedup

Images/sec (higher is better)

Number of GPUs

TensorRT GoogLeNet on Multi-GPU

(INT8, batch size=128)