Bioinformatics Asked on June 21, 2021
As far as I understood, for most assembly programs, the scaffolding step takes into consideration paired-end information in order to get from contigs (contiguous sequences) to scaffolds (longer sequences that might have some N-filled gaps).
My assembly software of choice, MEGAHIT, uses paired-end information to build the contigs, but it does not output a standard scaffold. So I am wondering the following, related things:
-is it meaningful to run a scaffold program on the output of MEGAHIT? I imagine there might be some instances in which paired-end information could span a gap.
-Which software would you recommend for it? (I’ve tried soapdenovo2 and SSPACE, but they appear not to be actively maintained so I have the issue of ‘it doesn’t work and I can’t do anything about it’)
-Could relevant information (eg. two contigs being connected by pair-end information) be recovered by alternative and perhaps more user-friendly means, such as exploring the assembly graph with Bandage?
Thank you for your time!
Update 2: It looks like your approach has actually been suggested here as one way to use the PE information. I guess megahit may not be really using the PE information anyways. I still believe that it's kind of weird but other people do suggest it, so maybe it's worth trying what Torsten suggests.
I think that scaffolding with PE information from the reads used as input to Megahit is somewhat sketchy. You could still try it, but I'd be worried about artifacts, just because de novo assemblers are so heuristic.
However, I think that it is perfectly ok to scaffold using orthogonal data. Here are some examples of orthogonal data:
The tools used in each case would be somewhat different. For a little more information about how you might use this, here is a recent review of tech.
Full disclosure: I work for a company that sells Hi-C kits for such applications.
Update: Realized that I missed one part of the Q. I think that visually exploring the assembly using e.g. bandage is always a good idea. Quite possibly you can make some scaffolding decisions that way, but to me it sounds somewhat painful to do, especially in a metagenome where there are going to be a lot of multiple-branching collapsed regions.
Answered by Maximilian Press on June 21, 2021
You can use SOAPdenovo-Fusion to scaffold contigs produced by MEGAHIT as suggested by one of the developers: https://github.com/aquaskyline/SOAPdenovo2
Answered by Robvh on June 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP