White Papers
Dell - Internal Use - Confidential
performance of P100-PCIe with that of CPU and K80 GPUs for this application. It is shown that within 1
node, 4 P100-PCIe is 6.6x faster than 2 E5-2690 v4 CPUs and 1.4x faster than 4 K80 GPUs.
Figure 5: LAMMPS Performance on P100-PCIe
Figure 6 : Comparison between Configuration G and Configuration B
202
355
505
801
1095
1172
1.0
1.8
2.5
4.0
5.4
5.8
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0
200
400
600
800
1000
1200
1400
1 P100 2 P100 4 P100 8 P100 12 P100 16 P100
Speedup
Timesteps/s (higher the better)
LAMMPS Performance with lj dataset
timesteps/s speedup over 1 P100
x16
G
2 CPU / 4 GPU
2 Virtual Switches
2 GPU per CPU
PCIe Gen3 96-lane Switch
GPU1 GPU2
LP
Slot
#2
CPU1
LP
Slot
#1
x16 x8 x16 x16 x8
X X
GPU3 GPU4
CPU2
x16
B
2 CPU
4:1 Switched
GPU1 GPU2 GPU3 GPU4
LP
Slot
#1
CPU1
LP
Slot
#2
PCIe Gen3 96-lane Switch
x16 x8 x16 x16 x8
X X X
CPU2