Bioinformatics Asked on October 3, 2021
I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought.
I found Phylo
from Biopython
only handles already calculated trees.
The obvious single answer is R "ape". This will give you access to PhylML for tree building and Clustal/Muscle for alignment building. The paths to the binarys are important. There are several distance methods in there such as NJ and BIONJ. Its distance approaches however don't look mainstream, but I could be wrong.
There are functions within ape which are cool, the tree sorting is very cool and I need to read through this with much greater care. Personally I wouldn't perform a core phylogenetic analysis within R, because the standalones are sufficient and the analysis is intensive.
Answered by M__ on October 3, 2021
I would not look for a package for this, but instead build a small pipeline calling external tools with something like the following workflow:
Of course this is rather general and depending on exactly what you're doing you may want a different workflow and/or different tools. You should also explore the parameter space, do not assume the defaults are necessarily good choices
Answered by Chris_Rands on October 3, 2021
I agree with Chris Rands that a reasonable approach would be to call external tools.
However, if you really want to do the phylogeny from within Python, you could use the P4 package, which is a bit complicated to handle but gives you lots of options in the way to build MCMC-based bayesian phylogenies:
https://github.com/pgfoster/p4-phylogenetics
You would still need something else to align the sequences before.
To visualize the tree using python, you could use the ete toolkit, which is likely more powerful than what you can find in Biopython: http://etetoolkit.org/
Answered by bli on October 3, 2021
(if I understand your situation correctly) https://www.rdocumentation.org/packages/seqinr/versions/3.6-1/topics/read.alignment
shows how to use the function read.alignment which can take fasta msf etc. The docs provide the example' read.alignment(file = system.file("sequences/LTPs128_SSU_aligned_First_Two.fasta", package = "seqinr"), format = "fasta", whole.header = TRUE)
but you can use this code below (assumes those files are aligned) to go from reading the tree to getting the distances, producing the neighbor joining phylogenetic tree, and then plotting the tree.
library("Biostrings")
library("seqinr")
library("ape")
library(phylogram)
library("dendextend")
fasta.res <- read.alignment(file = "geneticAlignment.msf", format = "fasta")
fasta.res.dist.alignment = dist.alignment(msf.res, matrix = "identity")
fasta.res.dist.alignment.nj = nj(fasta.res.dist.alignment)
plot(fasta.res.dist.alignment.nj, main = "from fasta files")
Answered by Vass on October 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP