Bioinformatics Asked by pippo1980 on April 9, 2021
Hi I am playing with Biopython and BioSQL,
I manage to create an MySQL database using Managing local biological databases with the BioSQL module tutorial:
and load 2909 fasta sequences in it through:
db = server["test_1"]
count = db.load(SeqIO.parse("Newtextfile-03.txt", "fasta"))
print("Loaded %i records" % count)
server.commit()
from this file: Newtextfile-03.txt
problem is trying just to print the database index list with:
for i in db:
print(i)
works , while trying print element by element:
for i in db:
print(i, db[i])
or with:
for key, record in db.items():
print ("Key %r maps to a sequence record %s" % (key, record))
gives an error:
2557 Traceback (most recent call last):
File "./try_001.py", line 142, in <module>
f()
File "./try_001.py", line 120, in f
print(i, db[i])
File "......../lib/python3.8/site-packages/Bio/SeqRecord.py", line 652, in __str__
lines.append(repr(self.seq))
File "........//lib/python3.8/site-packages/Bio/Seq.py", line 109, in __repr__
return f"{self.__class__.__name__}({self._data!r})"
AttributeError: 'DBSeq' object has no attribute '_data'
the error occurs for three records namely : [2164, 2522, 2557]
inspecting the Newtextfile-03.txt file results in finding fasta sequences of less than
100 aminoacids.
Is there any way to resolve the error ? It sounds strange to me that I can load the records into the database but not retrieving them. Is anything I am missing about the record load ? or similar ?
I am working with the database using:
server = BioSeqDatabase.open_database(
driver='MySQLdb' ,
user=password.userz,
password=password.pazzword,
host="XXXXXXXX",
db="bio-data-first")
db = server.new_database("test_1", description="Just for testing")
server.commit()
using driver="MySQLdb", wasn’t able to have driver=’mysql.connector’ to work,
biopython is version 1.78
It's a bug in Biopython 1.78.
What those three sequences have in common is that they're less than 61 characters long. When you print the record it implicitly calls __str__
for the record, which calls __repr__
for the Seq. DBSeq gets its method from Seq, which looks like:
def __repr__(self):
"""Return (truncated) representation of the sequence for debugging."""
if len(self) > 60:
# Shows the last three letters as it is often useful to see if
# there is a stop codon at the end of a sequence.
# Note total length is 54+3+3=60
return f"{self.__class__.__name__}('{str(self[:54])}...{str(self[-3:])}')"
else:
return f"{self.__class__.__name__}({self._data!r})"
...and then it only tries to access the nonexistent _data
for the short sequences.
So there's a mismatch between DBSeq's lack of a _data
attribute and its use of Seq.__repr__
. I don't see a DBSeq class at all in the latest Biopython, so maybe this is fixed already. But there's no tag for 1.79 yet that I can see and I can't get the very latest code to import successfully so I'm not sure. In the meantime if you avoid letting repr(record)
happen you should be able to work around it.
Correct answer by Jesse on April 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP