Bioinformatics Asked by CuriousTree on June 24, 2021
I am very new to bioinformatics (and python in general), but I would like to use python to more efficiently analyse enzymes both in terms of structure and functio, using Jupiter notebook. I would like to ask what is the best program/source code for multiple sequence alignments (amino acids) to identify conserved binding sites etc. I see that biopython has a few ways of creating alignments, but I have the impression that it is more focused on nucleotide sequences?
In my personal experience, MUSCLE is the easiest program to use in conjunction with Biopython. Biopython features a command line wrapper for this program, which makes it very easy to use. Make sure to download the appropriate MUSCLE program from drive5 and save it somewhere. E.g., if you are using Jupyter in Linux:
!wget https://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz
!tar -xzvf muscle3.8.31_i86linux64.tar.gz
!cp muscle3.8.31_i86linux64 /usr/local/bin
!chmod 755 /usr/local/bin/muscle3.8.31_i86linux64
Then, you can run MUSCLE like so:
from Bio.Align.Applications import MuscleCommandline
def runMUSCLE(infile, outfile):
muscle_exe = r"/usr/local/bin/muscle3.8.31_i86linux64" #Here is where we installed MUSCLE
muscle_cline = MuscleCommandline(muscle_exe,
input=infile,
out=outfile,
clwstrict=True #Output in clustal format (more visually pleasing), otherwise the output is in FASTA. Whichever you need.
)
muscle_cline()
You don't need to specify that your sequence is amino acidic, however keep in mind that the input file must be in FASTA format.
Answered by albertr on June 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP