You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working with Dataset 3 and am trying to compare K50 values for proteins that share a fold. I can't see an easy to way to identify which proteins go to which domain (or fold) from the columns in "K50_Dataset3.csv", however. Do you happen to have a table available that maps the name of each entry in K50_Dataset3.csv to its associated domain? In all, I'm hoping to be able to distinguish (1) which natural proteins share a wild type and (2) which designed proteins share the same base scaffold.
The text was updated successfully, but these errors were encountered:
Somewhat, though it's definitely not perfect. I approached it two separate ways:
I made the assumption that all entries with the same pdb name prior to the mutation information were from the same group. This works well for about 90% of the entries, and I could confirm by clustering with mmseqs that my initial assumption about the pdb name was good.
I used mmseqs easy-cluster (see here) to cluster at 90% sequence identity, 80% coverage. I assumed that anything that shared a cluster was from the same domain. This is obviously imperfect, but it at least groups similar proteins.
I'm working with Dataset 3 and am trying to compare K50 values for proteins that share a fold. I can't see an easy to way to identify which proteins go to which domain (or fold) from the columns in "K50_Dataset3.csv", however. Do you happen to have a table available that maps the name of each entry in K50_Dataset3.csv to its associated domain? In all, I'm hoping to be able to distinguish (1) which natural proteins share a wild type and (2) which designed proteins share the same base scaffold.
The text was updated successfully, but these errors were encountered: