White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Dell - Internal Use - Confidential

Figure 2: The performance of inference with GoogLeNet on P40 and P4

Figure 3: P40 vs P4 for AlexNet with different batch sizes

In our previous blog, we compared the inference performance using both FP32 and INT8 and the

conclusion is that INT8 is ~3x faster than FP32. In this study, we also compare the accuracy when using

both operations to verify that using INT8 can get comparable performance to FP32. We used the latest

TensorRT 2.1 GA version to do this benchmarking. To make INT8 data encode the same information as

FP32 data, a calibration method is applied in TensorRT to convert FP32 to INT8 in a way that minimizes

the loss of information. More details of this calibration method can be found in the presentation “8-bit

1.00

2.00

3.00

0.52

1.06

1.58

2.10

1.00

1.01

1.00

1.44

1.51

1.46

1.41

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

1 GPU 2 GPUs 3 GPUs 4 GPUs

Relative Images/sec/Watt

Relative Images/sec

P40 vs P4 for GoogLeNet on R740

(batch_size=128)

Perf - P40 Perf P4 Perf/Watt - P40 Perf/Watt - P4

1.0

8.1

9.2

9.9

10.0

9.9

9.7

9.6

9.5

1.2

13.3

14.7

14.1

15.3

14.8

15.2

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

0.0

5.0

10.0

15.0

20.0

25.0

1 32 64 128 256 512 1024 2048 4096

Relative Images/sec/Watt

Relative Images/sec

Batch Size

P40 vs P4 for AlexNet (with 1 GPU)

P40 P4 Perf/Watt-P40 Perf/Watt-P4