Bioinformatics Asked on April 25, 2021
I’ve been using MEGAHIT to assemble metagenomes, with particular focus on specific genomic areas.
Sometimes all I get is gene fragments or pathway fragments (eg. if I know that genes A, B, C, D and E should be together, I only get A, B, C in one contig, and maybe D and E in another). That is understandable when the sequencing depth is low, and I’m assume that it’s the best I can get.
However. In two of my metagenomes, the genes of interest seem to be abundant (depth of 100-300+ listed for MEGAHIT contigs). And I still have the same problem. I assume that these ‘breaks’ between contigs are due to natural sequence variation which breaks the assembly process. (while sequencing errors are also possible, I’ve used bbduk to clean my data prior to assembly)
I’d like to have larger contigs because I’m curious about the gene order. Things I’ve tried (did not work): visualizing assembly graphs with Bandage around Blast hits, starting with a smaller k-min (21 as opposed to 27).
So I guess I’m wondering if using even smaller k-mers might make a difference. Or if someone has another suggestion, I’m happy to listen! Thank you.
Megahit setting used:
megahit -r input_files.fastq --num-cpu-threads 32 --min-contig-len 300 --presets meta-large -o output
megahit -r input_files.fastq --num-cpu-threads 32 --min-contig-len 1500 --presets meta-sensitive -o output
(meta-large starts from minimum kmer size 27; meta-sensitive from 21)
You can try s-aligner (free for 15 days) which usually gets quite larger contigs and quite larger NG50 for metagenomic data containing viruses (also phages). Indeed metaSPAdes also gets larger contigs, despite shorter than s-aligner.
Ref. s-aligner: a greedy algorithm for non-greedy de novo genome assembly
Answered by juanjo75es on April 25, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP