White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Figure 3: Multi-GPU inference performance with TensorRT AlexNet

To highlight the performance advantage of P40 GPU and its native support for INT8, we compared the

inference performance between P40 with the previous generation GPU M40. The result is shown in Figure

5 and Figure 6 for GoogLeNet and AlexNet, respectively. In FP32 mode, P40 is 1.7x faster than M40. And

the INT8 mode in P40 is 4.4x faster than FP32 mode in M40.

Figure 4: Inference performance comparison between P40 and M40

16270

32386

64691

1.99

3.98

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

10000

20000

30000

40000

50000

60000

70000

1 P40 2 P40 4 P40

Speedup

Images/sec (higher is better)

Number of GPUs

TensorRT AlexNet on Multi-GPU

(INT8, batch size=128)

1446

2215

6410

1000

2000

3000

4000

5000

6000

7000

FP32 INT8

Images/sec (higher is better)

Ooperations mode

P40 vs M40 for GoogLeNet with TensorRT

(batch size=128)

M40 P40