White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

Figure 20: PowerEdge C4140-V100-SXM2- Configuration-K vs PowerEdge C4140-V100-SXM2

Configuration-M

As shown in Figure 21 below, it shows that the number of CPU cores does play a role in terms of

throughput. And the biggest difference is when running AlexNet.

7.1.8 What role does CPU play in Deep learning?

The CPU plays a major role in the initial phase called data preprocessing. The steps below show

an instruction pipeline, with the following 4 instructions happening in parallel:

a. Train on batch n (on GPUs)

b. Copy batch n+1 to GPU memory

c. Transform batch n+2 (on CPU)

d. Load batch n+3 from disk (on CPU)

The loop for the data processing when training is:

a. Load mini-batch

b. Preprocess mini-batch

c. Train on mini-batch