White Papers

GPUs can only be programmed with CUDA, OpenACC and the OpenCL framework. Most of the life sciences community is not familiar
with these frameworks, and so few biologists or bioinformaticians can make efficient use of GPU architectures. However, GPUs have
been making inroads into the molecular dynamics and electron microscopy fields. These fields require heavy computational work to
simulate biomolecular structures or their interactions and reconstruct 3D images from millions of 2D images generated from an electron
microscope.
We selected three different molecular dynamics applications to run tests on a PowerEdge C4130 with four K80s. The applications are
Amber 16, HOOMD-blue and NAMD [14, 15, 16].
The HPC focused Tesla series K80 GPU provides 8.74/2.91 TFLOPs (single/double
precision) compute capacity, which is 31%-75% more than the K40, the previous
Tesla card [17].
A benchmark suite is available from the Amber and HOOMD-blue web sites, and we used these suites with minor adjustments for the
benchmark. The command line and a configuration file used to test NAMD performance is listed in APPENDIX A. All the tests are
repeated on local storage and Lustre.
Molecular dynamics application test configuration
A single PowerEdge C4130 with Intel® Xeon® Dual E5-2690 v3 with 128 GB DDR4 2133MHz and four K80 in C configuration [17].
Local storage and Lustre connected through Intel® OPA are tested.
Amber benchmark suite
This suite includes the Joint Amber-Charmm (JAC) benchmark considering dihydrofolate reductage (DHFR) in an explicit water bath
with cubic periodic boundary conditions. The major assumptions are that the DHFR molecule presents in water without surface effect
and its movement assumed to follow microanonical (NVE) ensemble which assumes constant amount of substance (N), volume (V),
and energy (E). Hence, the sum of kinetic (KE) and potential energy (PE) is conserved, in other words, Temperature (T) and Pressure
(P) are unregulated. JAC benchmark repeats simulations with Isothermalisobaric (NPT) ensemble that assumes N, P and T are
conserved. It corresponds most closely to laboratory conditions with a flask open to ambient temperature and pressure. Beside these
settings, Particle mesh Ewald (PME) is the choice of algorithm to calculate electrostatic forces in molecular dynamics simulations. Other
biomolecules simulated in this benchmark suite are Factor IX (one of the serine proteases of the coagulation system), cellulose and
Satellite Tobacco Mosaic Virus (STMV). Here, we report the results from DHFR and STMV data.
The test results from Amber benchmark suite cases including three simulations, two JAC production simulations with DHFR and one
STMV production simulation are illustrated In Figure 8. The CPU 28 cores” label refers to Amber simulations run strictly on CPUs. In
this case, all 28 cores are used. The single GPU 0, 1, 2 and 3 labels refer to tests performed on individual GPUs separately, whereas
the concurrent GPU 0, 1, 2 and 3 tests were run simultaneously. The 2 GPUs and 4 GPUs” labels indicate that multiple GPUs were
used to solve a single job. Finally, concurrent two GPU 0 1 and concurrent two GPU 2 3 are where the tests ran simultaneously.
Each test set has two performance measurements from different storage settings, local versus Lustre storage. Overall, GPUs can take
advantage of Lustre storage, and Lustre helps significantly when multiple GPUs are used for a single job. 4 GPU tests across different
simulations show more than 10% speed gain.