Bioinformatics Asked by Jabbath on May 13, 2021
I have two full genome assemblies for C. Elegans samples collected from two different geographical areas that I found on WormBase. These are in fasta format. I want to go gene-by-gene and compare the nucleotide sequences corresponding to the gene between the two samples. My goal with this is to count how many single-base differences there are in each gene. However, I’m not sure how to match the sequences in the assembly to known reference sequences (by the way is there a good resource to download all of these at once?) for C. Elegans genes.
My thought was to run BLAST between the reference sequence and each genome and find the match with the best score, if any. Is this a reasonable approach, or is there some better way to do this?
You don't want BLAST here, or at least not regular BLASTn. A better method would be to use tBLASTn to map the known C. Elegans proteins to the translated genome. However, I would recommend using a slightly more sophisticated approach:
Collect fasta sequences of all known C. Elegans proteins (amino acid sequences, not nucleotide). This should be easy enough to do on WormBase. For example, the protein sequences for C. elegans strain VC2010, assembly PRJEB28388, can be found here: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJEB28388/c_elegans.PRJEB28388.WS279.protein.fa.gz
Use a tool that can model introns and exons to map a protein into a genome. I haven't worked in this field in more than 10 years now, so it is very likely that there are other programs around today, but back in the day, I would do this using either exonerate
or genewise
. With exonerate
, the command would be:
exonerate -m protein2genome -n 1 -t assembly.fasta -q allproteins.pep > out
That will give you a list of best-hits for each of your input proteins and should accurately map the protein to the genome. Once you have that, you can extract the matches and start comparing.
Correct answer by terdon on May 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP