Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Raw File #1

Open
li-ziang opened this issue May 23, 2024 · 5 comments
Open

Dataset Raw File #1

li-ziang opened this issue May 23, 2024 · 5 comments

Comments

@li-ziang
Copy link

Hi,

I found this work super interesting! I tried to reproduce the results in Figure3b, but I am unfamiliar with R so it's a little hard for me to run and understand the code. Is there a way for me to get the csv raw file of the dataset in this paper like the files in the ThermoMPNN repo, so that I can use the dataset in a python version code like ThermoMPNN?

For example, the s669 filr provided by ThermoMPNN looks like this.

head -n 5 s669.csv
pdbid,chainid,variant,score
1A0F,A,S11A,1.8
1A7V,A,A104H,2.69
1A7V,A,A66H,1.98
1A7V,A,A91H,1.7

Thanks!

@belmaran
Copy link
Collaborator

Hi, thanks for the interest. For ease of use I have added a new Supplementary Table 5 to the zenodo repository with all the variant effect/stability predictions and the aPCA fitness scores. You can download it here: https://zenodo.org/records/11260616/files/Supplementary_Table_5_aPCA_vs_variant_effect_predictors.txt?download=1

@li-ziang
Copy link
Author

Hi, thanks so much for your prompt reply! I found this file is exactly what I needed, and will follow up if I have further questions.

@belmaran belmaran reopened this Jun 5, 2024
@belmaran
Copy link
Collaborator

belmaran commented Jun 5, 2024

Here I leave a new version of the file with more complete EVE, popEVE and Tranception predictions: https://zenodo.org/records/11493742/files/Supplementary_Table_5_aPCA_vs_variant_effect_predictors.txt?download=1

@li-ziang
Copy link
Author

li-ziang commented Jun 6, 2024

That helps a lot, thanks!

@li-ziang
Copy link
Author

li-ziang commented Jul 5, 2024

Hi,

I have tried the new data of the models you provided and attempted to reproduce the results in Fig. 3C. I found that some models' predictions for the fitness score are NaN. Could you explain why these values appear? Currently, my method for calculating rho is to ignore the NaN values, essentially treating them as if they do not exist. This approach has yielded results similar to those reported in the paper. However, if the model fails to give a value, wouldn't it be better to penalize the NaNs in some way, for example, by setting them to a particularly large value, like 1000?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants