Biopython BioSQL error : AttributeError: 'DBSeq' object has no attribute '_data'

Question

Hi I am playing with Biopython and BioSQL,
I manage to create an MySQL database using Managing local biological databases with the BioSQL module tutorial:
and load 2909 fasta sequences in it through:
db = server["test_1"]

count = db.load(SeqIO.parse("Newtextfile-03.txt", "fasta"))
        print("Loaded %i records" % count)
        server.commit()

from this file: Newtextfile-03.txt
problem is trying just to print the database index list with:
for i in db:
    print(i)

works , while trying print element by element:
for i in db:
    print(i, db[i])

or with:
for key, record in db.items():
            print ("Key %r maps to a sequence record  %s" % (key, record))

gives an error:
2557 Traceback (most recent call last):
  File "./try_001.py", line 142, in <module>
    f()
  File "./try_001.py", line 120, in f
    print(i, db[i])
  File "......../lib/python3.8/site-packages/Bio/SeqRecord.py", line 652, in __str__
    lines.append(repr(self.seq))
  File "........//lib/python3.8/site-packages/Bio/Seq.py", line 109, in __repr__
    return f"{self.__class__.__name__}({self._data!r})"

AttributeError: 'DBSeq' object has no attribute '_data'

the error occurs for three records namely : [2164, 2522, 2557]
inspecting the Newtextfile-03.txt file results in finding fasta sequences of less than
100 aminoacids.
Is there any way to resolve the error ? It sounds strange to me that I can load the records into the database but not retrieving them. Is anything I am missing about the record load ? or similar ?
I am working with the database using:
 server = BioSeqDatabase.open_database(
    driver='MySQLdb' , 
    user=password.userz,
    password=password.pazzword,
    host="XXXXXXXX",
    db="bio-data-first")

db = server.new_database("test_1", description="Just for testing")
        server.commit()

using  driver="MySQLdb", wasn't able to have driver='mysql.connector' to work,
biopython is version 1.78

Jesse · Accepted Answer

It's a bug in Biopython 1.78.
What those three sequences have in common is that they're less than 61 characters long.  When you print the record it implicitly calls __str__ for the record, which calls __repr__ for the Seq.  DBSeq gets its method from Seq, which looks like:
    def __repr__(self):
        """Return (truncated) representation of the sequence for debugging."""
        if len(self) > 60:
            # Shows the last three letters as it is often useful to see if
            # there is a stop codon at the end of a sequence.
            # Note total length is 54+3+3=60
            return f"{self.__class__.__name__}('{str(self[:54])}...{str(self[-3:])}')"
        else:
            return f"{self.__class__.__name__}({self._data!r})"

...and then it only tries to access the nonexistent _data for the short sequences.
So there's a mismatch between DBSeq's lack of a _data attribute and its use of Seq.__repr__.  I don't see a DBSeq class at all in the latest Biopython, so maybe this is fixed already.  But there's no tag for 1.79 yet that I can see and I can't get the very latest code to import successfully so I'm not sure.  In the meantime if you avoid letting repr(record) happen you should be able to work around it.

Biopython BioSQL error : AttributeError: 'DBSeq' object has no attribute '_data'

One Answer

Add your own answers!

Ask a Question