White Papers

20
APPENDIX A: Benchmark commands
BWA scaling test command
bwa mem -M -t [number of cores] -v 1 [reference] [read fastq 1] [read fastq 1] > [sam output file]
BWA-GATK commands
Phase 1. Pre-processing
Step 1. Aligning and sorting
bwa mem -c 250 -M -t [number of threads] -R ‘@RG\tID:noID\tPL:illumine\tLB:noLB\tSM:bar’ [reference chromosome] [read fastq
1] [read fastq 2] | samtools view -bu - | sambamba sort -t [number of threads] -m 30G --tmpdir [path/to/temp] -o [sorted bam output]
/dev/stdin
Step 2. Mark and remove duplicates
sambamba markdup -t [number of threads] --remove-duplicates --tmpdir=[path/to/temp] [input: sorted bam output] [output: bam
without duplicates]
Step 3. Generate realigning targets
java -d64 -Xms4g -Xmx30g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -nt [number of threads] -R [reference
chromosome] -o [target list file] -I [bam without duplicates] -known [reference vcf file]
Step 4. Realigning around InDel
java -d64 -Xms4g -Xmx30g -jar GenomeAnalysisTK.jar -T IndelRealigner -R [reference chromosome] -I [bam without duplicates] -
targetIntervals [target list file] -known [reference vcf file] -o [realigned bam]
Step 5. Base recalibration
java -d64 -Xms4g -Xmx30g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -nct [number of threads] -l INFO -R [reference
chromosome] -I [realigned bam] -known [reference vcf file] -o [recalibrated data table]
Step 6. Print recalibrated reads - Optional
java -d64 -Xms8g -Xmx30g -jar GenomeAnalysisTK.jar -T PrintReads -nct [number of threads] -R [reference chromosome] -I
[realigned bam] -BQSR [recalibrated data table] -o [recalibrated bam]
Step 7. After base recalibration - Optional
java -d64 -Xms4g -Xmx30g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -nct [number of threads] -l INFO -R [reference
chromosome] -I [recalibrated bam] -known [reference vcf file] -o [post recalibrated data table]
Step 8. Analyze covariates - Optional
java -d64 -Xms8g -Xmx30g -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R [reference chromosome] -before [recalibrated
data table] -after [post recalibrated data table] -plots [recalibration report pdf] -csv [recalibration report csv]
Phase 2. Variant discovery Calling germline variants
Step 1. Haplotype caller
java -d64 -Xms8g -Xmx30g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -nct [number of threads] -R [reference chromosome] -
ERC GVCF -BQSR [recalibrated data table] -L [reference vcf file] -I [recalibrated bam] -o [gvcf output]