White Papers
Dell - Internal Use - Confidential
Figure 7: LAMMPS Performance Comparison
HOOMD-blue
HOOMD-blue (for Highly Optimized Object-oriented Many-particle Dynamics - blue) is a general purpose
molecular dynamic simulator. Figure 8 shows the HOOMD-blue performance. Note that the y-axis is in
logarithmic scale. It is observed that 1 P100 is 13.4x faster than dual CPU. The speedup of using 2 P100 is
1.5x compared to using only 1 P100. This is a reasonable speedup. However, with 4 P100 to 16 P100, the
speedup is from 2.1x to 3.9x which is not high. The reason is that similar to LAMMPS, this application also
involves lots of communications among all used GPUs. Based on the analysis in LAMMPS, using
configuration B should reduce this communication bottleneck significantly. To verify this, we ran the same
application again on a configuration B server. With 4 P100, the performance metric “hours for 10e6 steps”
was reduced to 10.2 compared to 11.73 in configuration G, resulting in 13% performance improvement
and the speedup compared to 1 P100 was improved to 2.4x from 2.1x.
77
164
334
351
544
689
505
801
1172
0
200
400
600
800
1000
1200
1400
1 2 4
Timesteps/s (higher the better)
Number of Nodes
LAMMPS Performance with lj Dataset
2x2690 v4 CPU K80 P100-PCIe