How to obtain a list of proteins sorted by the ~1400 unique protein folds?

Question

The databases CATH and SCOP both have around 1400 unique protein folds recorded from analysis of the PDB.  However, I do not see any method to access this particular data.

A list of each of the 1400 folds (just an id number, and/or a descriptor)?
For each individual fold (of the 1400), a list of PDB IDs for proteins which are known to adopt each individual fold?

akaDrHouse · Answer

It looks like you can download the full database in SQL format or parse-able text files from here:  SCOP Download - Berkeley

The link has a link to the Schema as well:

David · Answer

If there is a simple way provided to do this it is very well hidden. The tedious and stupid way to do 1 (get a list of folds) would seem to involve rolling your own:

Go to http://scop.berkeley.edu/ver=2.07 (or whatever is the latest version).

Click on each of the 12 classes in turn. e.g. (a) all alpha proteins will take you to http://scop.berkeley.edu/sunid=46456 .

Save the source of each page as text.

Write and run your own parser to pull out the sunid () from the http://scop.berkeley.edu/sunid= and the description line if you wish. (This assumes you program.) I think this sunid is the fold id.

If you can than find some database or table that has PDB and sunid values in it, you can write another program to find the answer to 2.
Alternatively… (appended January 2021)

Download dir.cla.scope.2.07-stable.txt (or the latest version)
Save as a text file.
Open in Mircorsoft Excel. (Just dragging onto the app icon formatted it properly on my Mac. Your mileage may vary.)
You can just select the column with the ids, paste into another sheet, and then remove duplicates to get all the different fold ids. (Alternatively, you have about 276,000 entries to do with whatever you wish.)

How to obtain a list of proteins sorted by the ~1400 unique protein folds?

2 Answers

Add your own answers!

Ask a Question