White Papers
Dell - Internal Use - Confidential
14
The number of compute nodes used for the tests are 64x C6420s and 63x C6320s (64x C6320s for testing H600). The number of
samples per node was increased to get the desired total number of samples processed concurrently. For C6320 (13G), 3 samples per
node was the maximum number of samples each node can process. 64, 104, and 126 test results for 13G system (blue) were with 2
samples per node while 129, 156, 180, 189 and 192 sample test results were obtained from 3 samples per node. For C6420 (14G), the
tests were performed with maximum 5 samples per node. The plot for 14G was generated by processing 1, 2, 3, 4, and 5 samples per
node. The number of samples per node is limited by the amount of memory in a system. 128 GB and 192 GB of RAM were used in 13G
and 14G system, respectively. C6420s show a better scaling behavior than C6320s. 13G server with Broadwell CPUs seems to be
more sensitive to the number of samples loaded onto system as shown from the results of 126 vs 129 sample tests on all the storages
tested in this study.
Dell EMC PowerEdge C6420 has at least a 12% performance gain compared to the previous generation. Each C6420 compute node
with 192 GB RAM can process about seven 30x whole human genomes per day. This number could be increased if the C6420
compute node is configured with more memory. In addition to the improvement on the 14G server side, four Isilon F800 nodes in a 4U
chassis can support 64x C6420s and 320 30x whole human genomes concurrently.
MOLECULAR DYNAMICS SIMULATION SOFTWARE PERFORMANCE
Over the past decade, GPUs has become popular in scientific computing because of their great ability to exploit a high degree of
parallelism. NVIDIA has a handful of life sciences applications optimized and to run on their general-purpose GPUs. Unfortunately,
these GPUs can only be programmed with CUDA, OpenACC and the OpenCL framework. Most of the life sciences community is not
familiar with these frameworks, and so few biologists or bioinformaticians can make efficient use of GPU architectures. However, GPUs
have been making inroads into the molecular dynamics and electron microscopy fields. These fields require heavy computational work
to simulate biomolecular structures or their interactions and reconstruct 3D images from millions of 2D images generated from an
electron microscope.
We selected two different molecular dynamics applications to run tests on a PowerEdge C4130 with P100 and V100. The applications
are Amber and LAMMPS (12) (13).
Molecular dynamics application test configuration
The Dell EMC PowerEdge C4130 with Intel
®
Xeon
®
Dual E5-2690 v4 with 256 GB DDR4 2400MHz and P100 and V100 in G and K
configurations (14) (15). Unfortunately, we were not able to complete integration tests with Dell EMC Ready Bundle for HPC Life
Sciences with various interconnect settings. However, we believe the integration tests will not pose any problem since molecular
dynamics applications are not bounded by either storage I/O or inter-communication bandwidth.
The NVIDIA
®
Tesla
®
V100 accelerator is one of the most advanced accelerators available in the market right now and was launched
within one year of the P100 release. In fact, Dell EMC is the first in the industry to integrate Tesla V100 and bring it to market. As was
the case with the P100, V100 supports two form factors: V100-PCIe and the mezzanine version V100-SXM2. The Dell EMC
PowerEdge C4130 server supports both types of V100 and P100 GPU cards.
Figure 13 V100 and P100 Topologies on C4130 configuration K