White Papers

BWA shows stable scalability
Figure 4 shows the run times of BWA on various sequence data sizes ranging from 2 to 208 million fragments (MF) and different
number of threads. Oversubscription is avoided to ensure each thread runs on a single physical core. As shown in Figure 4 and Figure
5, BWA scales linearly over both input size and the number of cores.
Table 2 and Table 3 show speed-up due to increasing core count. The results indicate that the optimum number of threads for BWA is
in between 10 - 16. Based on this observation, 13 cores are used for BWA processes throughout the tests.
2
40
84
167
0.00
10.00
20.00
30.00
1 4 8 12 16 20 24
Data size (Million Fragments)
Running Time (hrs)
Number of cores
E5-2680 v3 - 2133 DIMM
2 10 40 70 84 122 167 208
Figure 4: BWA's scaling behavior on Haswell CPU with different input data size
2
40
84
167
0.00
5.00
10.00
15.00
1 4 8 12 16 20 24
Data size (Million Fragments)
Running Time (hrs)
Number of cores
E5-2690 v4 - 2400 DIMM
2 10 40 70 84 122 167 208
Figure 5: BWA's scaling behavior on Broadwell CPU with different input data size