White Papers

12
Figure 7. BWA-GATK performance results on H600 as storage usage increases.
SUMMARY
The results in this paper demonstrate that running whole genome analyses on the F800 and H600 platforms scales predictably and that
both are capable of supporting hundreds of simultaneous whole genome analyses in a single day. The F800 performs slightly better
with low sample workloads and much better with higher sample workloads than the H600, so the F800 is recommended for the highest
throughput environments.
If the HPC environment is exclusively for processing genomic analyses, then both Isilon and Lustre are good choices, but Isilon is a
better choice if features like backup, snapshots and multiple protocol (SMB/NFS/HDFS) support are required. If Isilon is chosen, then
given the results in this paper, a rough genomes/day calculation can be made in order to choose between the F800 or H600. However,
if the HPC workload is mixed and includes MPI-based or other applications that require low latency interconnects (Infiniband or
OmniPath) in order to scale well, then Lustre is the better choice.
REFERENCES
1. Dell HPC Lustre Storage Solution: Dell HPC Lustre Storage with IEEL 3.0
2. Dell EMC HPC System for Life Sciences v1.1 (January 2017)
3. All benchmark tests were run on the Zenith cluster in the Dell HPC Innovation Lab in Round Rock, TX. Zenith ranked #292 in the
Top 500 ranking as of November 2017: https://top500.org/list/2017/11/?page=3
4. GATK Best Practices (https://software.broadinstitute.org/gatk/best-practices/; Accessed 1/24/2018)
5. Variant Calling Benchmark Not Only Human (http://en.community.dell.com/techcenter/high-performance-
computing/b/genomics/archive/2016/05/27/variant-calling-benchmark-not-only-human; Accessed 1/24/2018)
6. Human genome 10x coverage data set ERR091571(ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR091/ERR091571/; Accessed 1/24/2018)
129
131
134
129
131
132
127
20
40
60
80
100
120
140
0
2
4
6
8
10
12
14
1% 14% 19% 37% 55% 73% 91%
Genomes/Day
RunningTime(hours)
StorageUsage:%full
BWA-GATKv3.6withIsilonH600
64Sampleson32nodes
ApplyRecalibration
VariantRecalibration
GenotypeGVCFs
HaplotypeCaller
BaseRecalibration
RealignaroundInDel
GenerateRealigningTargets
Mark/RemoveDuplicates
Aligning&Sorting
NumberofGenomesperDay