Bioinformatics Asked on May 7, 2021
Hi I am trying to learnt python3 and Biopython,
I am trying to check imported fasta file before processing so far using:
sequences = SeqIO.parse("fake_fasta_file", "fasta")
if not any(sequa):
print('n SOMETHING IS WRONG WITH YOUR INPUT FILE n')
exit()
works but as soon as I enter a ‘>’ in my fake_fasta_file , as an example
nihvivicòvo-l
,mnjobùJN
, Békibpç
, mnkn
nbigp
ù
mlkn
mlnù
> ftfyfiuuogpigèi
any(sequences) returns True even if:
for seq_record in sequences:
print(seq_record)
doesnt print anything
Any better way to check for real fasta file in Biopython ?
Or maybe it my fault using any() on the iterator instead of creating
a list of seq_record and checking on that ?
any()
loops over the iterator until the first truthy item is found or the end is reached. The iterator is not reset to the start after this. Using a list will change that behaviour since a list is not exhausted like a iterator. So, remove your any()
call and you will see a sequence record printed. Your 'fake_fasta_file.fa'
is actually recognised by Biopython as valid fasta:
>>> SeqIO.read('fake_fasta_file.fa', 'fasta')
SeqRecord(seq=Seq(''), id='ftfyfiuuogpigèi', name='ftfyfiuuogpigèi', description=' ftfyfiuuogpigèi', dbxrefs=[])
Biopython ignores lines before the first line beginning with '>'
(as possible comment lines) and then starts collecting records, in this case an empty sequence record is still a record. This behaviour is deliberate from Biopython, although you could argue if it's best or not, it's unlikely to change for historical reasons.
What is real fasta? (philosophical pause) If you want to validate a particular definition of FASTA format then you can write your own parser depending on the exact specification and rules you want.
Edit, e.g.
from Bio import SeqIO
records = SeqIO.parse("myfile.fa", "fasta")
first = next(records, None)
if first is None:
print("No fasta records")
elif first.seq == "":
print("Empty sequence")
elif all(c not in "ATCGN" for c in first.seq.upper()):
print("Non ATCGN characters")
etc.
Correct answer by Chris_Rands on May 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP