-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The train loss cannot convergence #7
Comments
We experienced similar behavior when training the model. This is why we optimized rmsd_lt2. You should see this increase. Do you? |
Sorry, I just checked the logs and noticed that Val inference rmsds_lt2 consistently remains zero. Additionally, the validation loss shows fluctuating and abnormal values such as:
I followed the command exactly as specified in the README file, so I suspect there might be a configuration issue or perhaps a bug in the code. |
It was common for us to have epochs with outliers and very big losses. But values should not be consistently this large. In our run |
I have trained for approximately 100 epochs, and the latest results for Val inference rmsds_lt2 are consistently zero, as shown below:
|
Below is the complete training log file for your reference. |
Could you try --limit_complexes 100 and setting the train set to validation set? i.e. And see if the problem persists? |
Thank you. I'll try it out and will share the results later. |
You have to comment out this line for it to work DiffDock-Pocket/datasets/pdbbind.py Line 988 in 98a1523
|
Maybe related: #6 (which has been fixed). |
Got it, thanks for the remainder |
I have pulled the newest version of the codebase and trained the scoring model using only 100 complex structures. However, the loss continues to fluctuate and has not converged yet. The training command used is as follows:
The complete training log is provided below: |
@fengshikun hi, were you able to train the model? |
The loss still cannot converge. |
Sorry for not getting back to you sooner. I don't have any concrete results yet, but I think there might be an issue when I ported parts of our code base and changing things on cuda. I will see if I can pinpoint this issue. Any help is much appreciated, as I don't have much time for this project nowadays. |
Hello, I've been attempting to train the score model using the command from the README file. However, I've noticed that the loss doesn't seem to converge. Could you please help me investigate which part might be going wrong?
The text was updated successfully, but these errors were encountered: