White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Figure 5: Inference performance comparison between P40 and M40

Deep learning inference can be applied in different scenarios. Some scenarios require large batch size and

some scenarios even requires no batching at all (i.e. batch size is 1). Therefore we also measured the

performance difference when using different batch sizes and the result is shown in Figure 6. Note that the

purpose here is not comparing the performance of GoogLeNet and AlexNet, instead the purpose is to

check how the performance changes with different batch sizes for each neural network. It can be seen

that without batch processing the inference performance is very low. This is because the GPU is not

assigned enough workloads to keep it busy. The larger the batch size is, the higher the inference

performance is, although the rate of the speed increasing becomes slower. When batch size is 4096,

GoogLeNet stopped running because the required GPU memory for this neural network exceeds the GPU

memory limit. But AlexNet was able to run because it is a less complicated neural network than GoogLeNet

and therefore it requires less GPU memory. So the largest batch size is only limited by GPU memory.

Figure 6: Inference performance with different batch sizes

3200

5198

16292

5000

10000

15000

20000

FP32 INT8

Images/sec (higher is better)

Operations mode

P40 vs M40 for AlexNet with TensorRT

(batch size=128)

M40 P40

791

11464

14438

16292

17101

17502

17768

17951

18011

594

5417

6059

6410

6630

6763

6822

6842

5000

10000

15000

20000

1 32 64 128 256 512 1024 2048 4096

Image/sec (higher is better)

Batch size

1x P40 INT8 AlexNet and GoogLeNet

AlexNet GoogLeNet