Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ckpt_best.pth performing the worst #2054

Open
cameron388 opened this issue Sep 30, 2024 · 2 comments
Open

ckpt_best.pth performing the worst #2054

cameron388 opened this issue Sep 30, 2024 · 2 comments

Comments

@cameron388
Copy link

cameron388 commented Sep 30, 2024

💡 Your Question

I've trained a model to 100e on my own dataset.

When testing the model using "prediction = model.predict(processed_image_path, conf=confidence_threshold, fp16=False) #prediction = model.predict(processed_image_path, conf=confidence_threshold)" Im finding reproducibly (across different models) that ckpt_best.pth performs significantly worse in terms of recall and specificity.

For example here are some values running _best.pth compared to _latest.pth

_best

Confidence Threshold: 0.50
Precision: 0.8250
Recall: 0.7500
Specificity: 0.7941

Confidence Threshold: 0.70
Precision: 0.9355
Recall: 0.6591
Specificity: 0.9412

Confidence Threshold: 0.75
Precision: 0.9259
Recall: 0.5682
Specificity: 0.9412

Confidence Threshold: 0.80
Precision: 1.0000
Recall: 0.4773
Specificity: 1.0000

Confidence Threshold: 0.85
Precision: 1.0000
Recall: 0.1818
Specificity: 1.0000

_latest

Confidence Threshold: 0.50
Precision: 0.9649
Recall: 0.9910
Specificity: 0.9529

Confidence Threshold: 0.70
Precision: 0.9808
Recall: 0.9189
Specificity: 0.9765

Confidence Threshold: 0.80
Precision: 1.0000
Recall: 0.8288
Specificity: 1.0000

Confidence Threshold: 0.85
Precision: 1.0000
Recall: 0.6937
Specificity: 1.0000

Obviously this is a very surprising result so I'm wondering if something has gone wrong?

Versions

No response

@yevhen-k
Copy link

yevhen-k commented Oct 3, 2024

I have the same issue. I have yolo nas training set up on coco dataset. I trained model for a few epochs to test training set up. Then, I continued training from the checkpoint another 200 epochs. And it seems like the ckpt_best.pth wasn't updated since. Only average and latest checkpoints were updated.

image

image

@yevhen-k
Copy link

yevhen-k commented Oct 5, 2024

I use super-gradients==3.7.1. According to logs, ckpt_best.pth indeed was saved only once after the end of the first epoch of the first training.

Inspecting model checkpoints I've found the following:

best_model        acc: 0.0001     epochs: 1
average_model    acc: 0.3328     epochs: 200
latest_model     acc: 0.3328     epochs: 200

I have the same issue. I have yolo nas training set up on coco dataset. I trained model for a few epochs to test training set up. Then, I continued training from the checkpoint another 200 epochs. And it seems like the ckpt_best.pth wasn't updated since. Only average and latest checkpoints were updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants