White Papers

13
Table 2: Speed-up by increasing parallelism on E5-2680 v3/DDR4-2133
Speed-up
Sequence Data Size (Million Fragments)
2
10
40
70
84
122
167
208
Number
Of Cores
1
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
4
3.84
3.84
3.77
3.72
3.73
3.74
3.77
3.82
8
7.38
7.59
7.26
7.43
7.46
7.44
7.50
7.53
12
10.14
11.30
10.72
10.97
11.09
10.96
11.06
11.15
16
12.62
14.88
14.05
14.41
14.57
14.39
14.58
14.67
20
14.20
18.25
17.34
17.71
17.95
17.65
17.64
18.07
24
15.35
21.72
20.37
20.64
21.12
20.84
21.24
21.36
Table 3: Speed-up by increasing parallelism on E5-2690 v4/DDR4-2400
Speed-up
Sequence Data Size (Million Fragments)
2
10
40
70
84
122
167
208
Number
Of Cores
1
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
4
3.48
3.70
3.49
3.48
3.52
3.55
3.53
3.59
8
6.26
6.86
6.57
6.64
6.63
6.63
6.64
6.75
12
8.47
10.16
9.54
9.59
9.88
9.72
9.85
9.96
16
10.54
13.36
12.56
12.82
12.95
12.53
13.03
13.01
20
11.68
16.40
15.37
15.67
15.97
15.64
15.69
15.94
24
12.71
19.20
17.95
18.53
18.77
18.47
18.83
18.93
More importantly, the gain in speed up by using E5-2690 v4/DDR4 2400 is at least 8 % as shown in Table 4. This speed up is likely due
to the higher clock speed of CPUs and faster memory.
Table 4: Speed-up by E5-2697 v4/DDR4-2400 in comparison to E5-2680 v3/DDR4-2133
Speed-up
Sequence Data Size (Million Fragments)
2
10
40
70
84
122
167
208
Number
Of Cores
1
1.31
1.30
1.26
1.26
1.26
1.27
1.49
1.47
4
1.19
1.25
1.17
1.18
1.19
1.20
1.40
1.38
8
1.12
1.17
1.14
1.12
1.12
1.13
1.32
1.31
12
1.10
1.17
1.12
1.10
1.12
1.12
1.33
1.31
16
1.10
1.16
1.13
1.12
1.12
1.10
1.33
1.30
20
1.08
1.17
1.12
1.11
1.12
1.12
1.33
1.29
24
1.09
1.15
1.11
1.13
1.12
1.12
1.32
1.30
However, our choice of CPUs for the solution is Intel® Xeon® Processor E5-2697 v4 with 18 cores since larger core count CPUs are
preferable in most life science applications.
Genomics/NGS data analysis performance
A typical variant calling pipeline consists of three major steps 1) aligning sequence reads to a reference genome sequence; 2)
identifying regions containing SNPs/InDels; and 3) performing preliminary downstream analysis. In the tested pipeline, BWA 0.7.2-
r1039 is used for the alignment step and Genome Analysis Tool Kit (GATK) is selected for the variant calling step. These are
considered standard tools for aligning and variant calling in whole genome or exome sequencing data analysis. The version of GATK
for the tests is 3.5, and the actual workflow tested was obtained from the workshop, ‘GATK Best Practices and Beyond’. In this
workshop, they introduce a new workflow with three phases.