Proper use of BWA MEM on multiplexed GBS sample

Question

I have a multiplexed lane of GBS sequencing reads as a fastq file. I understand the first step is to demultiplex and trim the adapter sequences from the reads.
This yields many individual fastq files that correspond to individual libraries.
I would like to genotype my samples, but I need to map the reads first. My question is how should I perform the BWA MEM alignment?
Do I perform BWA MEM on each of the resulting fastq files that correspond to individual libraries...
OR
I should I perform BWA MEM on the large original fastq file that contains all my sequence reads before demux and read trimming?
Forgive my ignorance, I've only ever used TASSEL (which works great) and I'm trying to expand my understanding and flexibility.

Jvstonebridge · Answer

Trimming is something you always want to do before mapping to make sure that the bases at each position are of sufficient quality, and to make sure that you are not aligning your adapters to the reference genome.

I am not entirely sure what the biological difference is represented in your individual libraries (that depends on your research question). But most likely you want to demultiplex the reads before mapping with BWA as well. As a result you would indeed get a sam/bam file per library. Let me know if anything is still unclear!

Law · Answer

The first thing to do is to group the FastQ files together following this post. Once you have your reads grouped, you could trim your files before mapping. Then you could perform your BWA alignment. Since you mentioned MEM, I suppose your read length should be between 70-100bp as suggested by the author:
It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for
longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such 
as long-read support and split alignment, but BWA-MEM, which is the latest, 
is generally recommended for high-quality queries as it is faster and more accurate.
BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Then you could run your BWA MEM, supposing you have 10 CPUs available, like so:
bwa mem                                   # bwa mem algorithm
-M                                        # mark shorter split hits as secondary (for Picard compatibility).
-t 10                                     # CPUs thread
reference.fasta                           # reference genome to be aligned to
< sample_r1.fastq                         # fastQ read 1 (can be .gz)
< sample_r2.fastq |                       # optional FastQ read 2 (can be .gz)
samtools view -bS -@ 10 > sample.bam       # pipe to samtools to convert SAM to BAM on the fly.

Hope this helps !

Proper use of BWA MEM on multiplexed GBS sample

2 Answers

Add your own answers!

Ask a Question