How to annotate optimally a fungal genome without RNA-seq evidence?

Question

Genome information：

~50M nt
2300+ contigs
No pre-trained parameters in Augustus
There are several well annotated RefSeq genomes of this genus.
There are several RNA-seq data of this genus, but not in this species.

My current strategy:
EST evidence:

I selected many RNA-seq data that include as many species as
possible, and de novo assembled these data using Trinity.
Only the longest transcripts were saved.
All transcript files were catenated into a Total_transcript.fasta file, and cd-hit-est was used to reduce data size with options -c 0.8 -n 5 -s 0.8.

Protein evidence

All fungal proteins from SwissProt was download and reduced using cd-hit with options -c 0.8 -n 5 -s 0.8.

Proteinsets from RefSeq genomes of this genus were download, catenated, reduced using same methods.

I planed to annotate genome using Maker2 with above Evidence.
I have a doubt about my strategy: Maker2 calls Augustus to ab-initio annotation, but no pre-trained model. How should I do to solve this?
Anyone else could share your valuable suggestions? Thanks anyway.

Daniel Standage · Answer

The Maker documentation does include some instructions for training ab initio gene predictors, but it assumes an abundant EST database is available. (Assumptions about what kinds of sequences will be available for a draft genome assembly have changed drastically in the last decade.)
It may be worth exploring an iterative approach in any case. You can do a first-pass annotation with Maker using the evidence you discussed, along with ab initio predictions from Augustus (and I'd recommend SNAP as well). Each gene model will be scored by Maker with the annotation edit distance (AED), representing how well the gene model agrees with the evidence (0.0 is best). From this first pass, if you can identify a few hundred reliable gene models with good AED scores (or maybe even as many as one or two thousand), that should be plenty of training data. You could use these gene models to train a new model for Augustus, using Maker's instructions or Augustus' own instruction. (In my experience, it was quite a bit of work in either case.)

How to annotate optimally a fungal genome without RNA-seq evidence?

One Answer

Add your own answers!

Ask a Question