White Papers

Dell - Internal Use - Confidential

The number of compute nodes used for the tests are 64x C6420s and 63x C6320s (64x C6320s for testing H600). The number of

samples per node was increased to get the desired total number of samples processed concurrently. For C6320 (13G), 3 samples per

node was the maximum number of samples each node can process. 64, 104, and 126 test results for 13G system (blue) were with 2

samples per node while 129, 156, 180, 189 and 192 sample test results were obtained from 3 samples per node. For C6420 (14G), the

tests were performed with maximum 5 samples per node. The plot for 14G was generated by processing 1, 2, 3, 4, and 5 samples per

node. The number of samples per node is limited by the amount of memory in a system. 128 GB and 192 GB of RAM were used in 13G

and 14G system, respectively. C6420s show a better scaling behavior than C6320s. 13G server with Broadwell CPUs seems to be

more sensitive to the number of samples loaded onto system as shown from the results of 126 vs 129 sample tests on all the storages

tested in this study.

Dell EMC PowerEdge C6420 has at least a 12% performance gain compared to the previous generation. Each C6420 compute node

with 192 GB RAM can process about seven 30x whole human genomes per day. This number could be increased if the C6420

compute node is configured with more memory. In addition to the improvement on the 14G server side, four Isilon F800 nodes in a 4U

chassis can support 64x C6420s and 320 30x whole human genomes concurrently.

MOLECULAR DYNAMICS SIMULATION SOFTWARE PERFORMANCE

Over the past decade, GPUs has become popular in scientific computing because of their great ability to exploit a high degree of

parallelism. NVIDIA has a handful of life sciences applications optimized and to run on their general-purpose GPUs. Unfortunately,

these GPUs can only be programmed with CUDA, OpenACC and the OpenCL framework. Most of the life sciences community is not

familiar with these frameworks, and so few biologists or bioinformaticians can make efficient use of GPU architectures. However, GPUs

have been making inroads into the molecular dynamics and electron microscopy fields. These fields require heavy computational work

to simulate biomolecular structures or their interactions and reconstruct 3D images from millions of 2D images generated from an

electron microscope.

We selected two different molecular dynamics applications to run tests on a PowerEdge C4130 with P100 and V100. The applications

are Amber and LAMMPS (12) (13).

Molecular dynamics application test configuration

The Dell EMC PowerEdge C4130 with Intel

Xeon

Dual E5-2690 v4 with 256 GB DDR4 2400MHz and P100 and V100 in G and K

configurations (14) (15). Unfortunately, we were not able to complete integration tests with Dell EMC Ready Bundle for HPC Life

Sciences with various interconnect settings. However, we believe the integration tests will not pose any problem since molecular

dynamics applications are not bounded by either storage I/O or inter-communication bandwidth.

The NVIDIA

Tesla

V100 accelerator is one of the most advanced accelerators available in the market right now and was launched

within one year of the P100 release. In fact, Dell EMC is the first in the industry to integrate Tesla V100 and bring it to market. As was

the case with the P100, V100 supports two form factors: V100-PCIe and the mezzanine version V100-SXM2. The Dell EMC

PowerEdge C4130 server supports both types of V100 and P100 GPU cards.

Figure 13 V100 and P100 Topologies on C4130 configuration K