White Papers
9
To determine the maximum sample throughput possible while using an H600 for data input and pipeline output, an increasing number
of genome samples were run on an increasing number of compute nodes with either 2 or 3 samples being run simultaneously on each
node. Batches of 32-192 samples were run on 16-64 compute nodes while using the H600 NFS-mounted to the compute nodes. Figure
4 illustrates the wall-clock time for each step in the pipeline on the left axis as well as total run time, while the right axis is a measure of
how many genomes per day can be processed utilizing a particular sample size, samples/node ration and total compute node
combination. The samples/node ratio and number of compute nodes used per batch of samples is illustrated beneath the graph. For the
32, 64, 80, 92, 116 and 128 sample sizes, 16, 32, 40, 46, 58 and 64 compute nodes were used, respectively, with a sample/node ratio
of 2. For the 156 and 192 sample sizes, 52 and 64 compute nodes were used, respectively, with a sample/node ratio of 3. The best
performing samples/node ratio will change depending on the number of cores and memory available on the nodes.
Figure 4. Number 10x WGS BWA-GATK performance results on H600.
The benchmark results in Figure 4 illustrate that when running 2 samples/compute node, the total run time is between 11 and 13 hours,
while running 3 samples/node yields an approximately 15 hour run time (156 samples). While the run time is longer when running 3
samples/node the total genomes/day throughput is higher, resulting in 252 genomes/day in the run with 156 samples. While the F800
was able to handle the processing of 189 samples in 14 hours, the H600 could not process 192 samples effectively. The 192 sample
total run time was over 30 hours (147 genomes/day), a significant performance decrease compared to the 156 sample run (15 hours). If
>= 192 genomes/day is required, then an additional H600 should be added to the cluster. Alternatively, as seen in Figure 3, a single
F800 can provide that level of performance.
Comparing maximum pipeline throughput performance between the F800 (325 genomes/day) and the H600 (252 genomes/day) results
in 73 more genomes/day processed using the F800.