Will using smaller kmers help get larger contigs? If not, then what?

Question

I've been using MEGAHIT to assemble metagenomes, with particular focus on specific genomic areas.

Sometimes all I get is gene fragments or pathway fragments (eg. if I know that genes A, B, C, D and E should be together, I only get A, B, C in one contig, and maybe D and E in another). That is understandable when the sequencing depth is low, and I'm assume that it's the best I can get.

However. In two of my metagenomes, the genes of interest seem to be abundant (depth of 100-300+ listed for MEGAHIT contigs). And I still have the same problem. I assume that these 'breaks' between contigs are due to natural sequence variation which breaks the assembly process. (while sequencing errors are also possible, I've used bbduk to clean my data prior to assembly)

I'd like to have larger contigs because I'm curious about the gene order. Things I've tried (did not work): visualizing assembly graphs with Bandage around Blast hits, starting with a smaller k-min (21 as opposed to 27).

So I guess I'm wondering if using even smaller k-mers might make a difference. Or if someone has another suggestion, I'm happy to listen! Thank you.

Megahit setting used:

megahit -r input_files.fastq --num-cpu-threads 32 --min-contig-len 300 --presets meta-large -o output
megahit -r input_files.fastq --num-cpu-threads 32 --min-contig-len 1500 --presets meta-sensitive -o output
(meta-large starts from minimum kmer size 27; meta-sensitive from 21)

juanjo75es · Answer

You can try s-aligner (free for 15 days) which usually gets quite larger contigs and quite larger NG50 for metagenomic data containing viruses (also phages). Indeed metaSPAdes also gets larger contigs, despite shorter than s-aligner.

Ref. s-aligner: a greedy algorithm for non-greedy de novo genome assembly

Will using smaller kmers help get larger contigs? If not, then what?

One Answer

Add your own answers!

Ask a Question