Bioinformatics Asked on January 30, 2021
Genome information:
My current strategy:
EST evidence:
Trinity
.Total_transcript.fasta
file, and cd-hit-est
was used to reduce data size with options -c 0.8 -n 5 -s 0.8
.Protein evidence
All fungal proteins from SwissProt was download and reduced using cd-hit
with options -c 0.8 -n 5 -s 0.8
.
Proteinsets from RefSeq genomes of this genus were download, catenated, reduced using same methods.
I planed to annotate genome using Maker2
with above Evidence.
I have a doubt about my strategy: Maker2 calls Augustus to ab-initio annotation, but no pre-trained model. How should I do to solve this?
Anyone else could share your valuable suggestions? Thanks anyway.
The Maker documentation does include some instructions for training ab initio gene predictors, but it assumes an abundant EST database is available. (Assumptions about what kinds of sequences will be available for a draft genome assembly have changed drastically in the last decade.)
It may be worth exploring an iterative approach in any case. You can do a first-pass annotation with Maker using the evidence you discussed, along with ab initio predictions from Augustus (and I'd recommend SNAP as well). Each gene model will be scored by Maker with the annotation edit distance (AED), representing how well the gene model agrees with the evidence (0.0 is best). From this first pass, if you can identify a few hundred reliable gene models with good AED scores (or maybe even as many as one or two thousand), that should be plenty of training data. You could use these gene models to train a new model for Augustus, using Maker's instructions or Augustus' own instruction. (In my experience, it was quite a bit of work in either case.)
Answered by Daniel Standage on January 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP