White Papers

Figure 3: Multi-GPU inference performance with TensorRT AlexNet
To highlight the performance advantage of P40 GPU and its native support for INT8, we compared the
inference performance between P40 with the previous generation GPU M40. The result is shown in Figure
5 and Figure 6 for GoogLeNet and AlexNet, respectively. In FP32 mode, P40 is 1.7x faster than M40. And
the INT8 mode in P40 is 4.4x faster than FP32 mode in M40.
Figure 4: Inference performance comparison between P40 and M40
16270
32386
64691
1
1.99
3.98
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0
10000
20000
30000
40000
50000
60000
70000
1 P40 2 P40 4 P40
Speedup
Images/sec (higher is better)
Number of GPUs
TensorRT AlexNet on Multi-GPU
(INT8, batch size=128)
1446
2215
6410
0
1000
2000
3000
4000
5000
6000
7000
FP32 INT8
Images/sec (higher is better)
Ooperations mode
P40 vs M40 for GoogLeNet with TensorRT
(batch size=128)
M40 P40