What is the full script to extract sequences from fasta file by using Ids in text file in python 3 and pycharm?

Question

I have a fasta file (gene.fasta)with sequences with names patterns:

>Hsc_gene_9627.t1
ATGGCACGCATTTTCCTCATTCTTTTATTGCTGCACAACATCTGCTGTGCAGCCGCTTCA
TTGCTCATTTTGAATGCCGTTACATTGGAGAAGGATGCTAATGATTATGCCGTTGGCGAT

and I have the Ids in text file (seq.txt) which are not the exact as in the fasta file:

HSC_gene_996
HSC_gene_9734
and some of the names came as 
HSC_gene_996|HSC_gene_9734

How can extract the sequences?
I am not experienced in python so please use python for dummies language :)
Thank you in advance

StupidWolf · Answer

For example, the fasta are like these:

gene.fasta:

>Hsc_gene_9627.t1
ATGGCACGCATTTTCCTCATTCTTTTATTGCTGCACA
ACATCTGCTGTGCAGCCGCTTCA
>Hsc_gene_9627.t2
GGGGTTTTCCCC
>Hsc_gene_962.t1
AGTCGTCAGTCAGTAGTCGC

seq.txt:

HSC_gene_9627
HSC_gene_9734 
HSC_gene_996|HSC_gene_9734

We use a few modules, read in the fasta:

import re
from Bio import SeqIO                                                                 
records = list(SeqIO.parse("gene.fasta","fasta"))

Read in the list:

seq = [i.rstrip().lower() for i in open('seq.txt').readlines()]

Here I use two functions, one to write fasta files, and the other to format the header for comparison:

def fasta_out(rec):
    return(">"+str(rec.description)+"n"+str(rec.seq))

def format(rec):
    return re.sub("[.][^ ]*$","",rec.description).lower()

Define the records to keep:

keep = [fasta_out(i) for i in records if format(i) in seq]

Write them out:

f=open('matched.fasta','w')
f.writelines("n".join(keep))
f.close()

We can check:

open('matched.fasta').readlines()                                                          
Out[50]: 
['>Hsc_gene_9627.t1n',
 'ATGGCACGCATTTTCCTCATTCTTTTATTGCTGCACAACATCTGCTGTGCAGCCGCTTCAn',
 '>Hsc_gene_9627.t2n',
 'GGGGTTTTCCCC']

Most likely there are some simple while, for loops that does not require modules..

What is the full script to extract sequences from fasta file by using Ids in text file in python 3 and pycharm?

One Answer

Add your own answers!

Ask a Question