White Papers

4
EXECUTIVE SUMMARY
In October 2015, Dell announced the Genomic Data Analysis Platform (GDAP) v2.0 to address the growing necessity of rapid genomic
analysis due to the availability of next-generation sequencing (NGS) technologies [1]. Upon the successful implementation of GDAP
v2.0, which is capable of processing up to 163 genomes per day
1
while consuming 2 kilowatt-hour (kWh) per genome, we started to
explore the life science domains beyond genomics. In addition to NGS, other instruments used in life science research, e.g. mass
spectrometers and electron microscopes also produce much more data than before. The analyses required to extract information from
these various data formats are highly variable and, thus, it is difficult to optimize a single system architecture to suit all the different use
cases. The Dell EMC HPC System for Life Sciences v1.1 is a flexible high performance computing environment designed to address
the computational challenges in genomic sequencing analysis, bioinformatics and computational biology.
AUDIENCE
This document is intended for organizations interested in accelerating genomic research with advanced computing and data
management solutions. System administrators, solution architects, and others within those organizations constitute the target audience.
INTRODUCTION
In the last decade, modern society has continued to improve the quality of life through better healthcare, producing and consuming
sustainable food and energy, and protecting our environment. All these societal advancements are tightly related to progress in the life
sciences domain. Bioinformatics emerged from the massive amount of data that is now available in these fields and the advancement
of HPC takes a key role to finding solutions.
Bioinformatics is an interdisciplinary field that develops methods and software tools
for understanding biological data. As an interdisciplinary field of science,
bioinformatics combines computer science, statistics, mathematics, and engineering
to analyze and interpret biological data
[2].
Here are some examples how HPC has been helping the life sciences;
NGS data analysis: Affordable genome sequencing and ‘omics’ technologies have driven the need for new bioinformatics
methods and HPC.
Understanding biochemical reactions in biomolecular systems: HPC is also a key component in advancing our understanding
of biochemical reactions and discovering new therapeutic molecules using molecular dynamics/mechanics and quantum
mechanics.
Modeling/simulating biological networks: HPC is required to model biological networks and simulate how networks behave in
response to perturbations. This is the area frequently referred as pathway analysis, and Boolean networks are used as a simulation
platform. The results can help identify adverse outcomes for various disease treatments.
Constructing the 3D structural images of biomolecules from electron microscopy: Cryo-electron microscopy (Cryo-EM) is
gaining popularity in structural biology due to the recent improvement in its resolution. This is a device that uses electron beams to
photograph frozen biological molecules and generates terabytes’ worth of data. HPC is an essential tool to reconstruct images of
biomolecules from the large volume of data.
Simulation of human organ functions: HPC is used to integrate diagnostic images, DNA/RNA profiles and protein expression
data into organ function simulation.
1
163 genomes per day was obtained when ‘UnifiedGenotyper’ in a previous version of Genome Analysis Tool Kit (GATK). In GATK 3.5 used in this
study, ‘HaplotypeCaller’ is recommended. The pipeline was updated according to the ‘GATK Best Practices and Beyond’ guideline. GDAP v2.0 performed
133 genomes per day on the updated BWA-GATK pipeline.