TopHat2 versus HISAT2 inner workings

Question

In my intro to bioinformatics course, we mentioned that TopHat2 and HISAT2 will both try to align as many reads as possible to the reference genome (TopHat2 has been superseded by HISAT2). For the reads that were not mapped, these probably cross exon boundaries. To solve this issue, both TopHat2 and HISAT2 chop up the unmapped reads into around 25bp fragments. Then:
"TopHat2 tries to align many fragments from each read using all possible exons as the search space. HISAT2 focuses on one fragment from each read, and once that fragment has been anchored it only searches locally."
My question is about that last part "once that fragment has been anchored, [HISAT2] only searches locally." Isn't it possible that a given fragment will map to many different regions of the genome? So how is it possible that it will be "anchored"? Are we going to anchor it multiple times and see which anchor makes most sense or what exactly?

ribozero · Answer

I don't believe this statement is true:

To solve this issue, both TopHat2 and HISAT2 chop up the unmapped reads into around 25bp fragments.

TopHat2 did not support spliced alignment directly.  This is why it followed this general approach of chopping the read up into small parts, followed by "contiguous" alignment of the sub-reads.  Hisat2 / Hisat-genotype support spliced alignment directly insofar as they don't pre-partition the reads into small parts, but allow dynamic extensions of alignments across splice junctions (in a way similar to how STAR works).  This is generally a fundamentally better approach and is one reason, among many, why TopHat/TopHat2 have been put into maintenance-only mode and have been officially superseded by Hisat2/genotype.

TopHat2 versus HISAT2 inner workings

One Answer

Add your own answers!

Ask a Question