Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

Open
cvgjnh opened this issue Jul 24, 2024 · 1 comment

Comments

@cvgjnh
Copy link

cvgjnh commented Jul 24, 2024

💡 Your Question

I used the Roboflow notebook to train a model from scratch for 25 epochs using the yolo_nas_s architecture. While this number of epochs did not produce great results, I looked at the checkpoints and found that the size of ckpt_best.pth was 244.3 mb while average_model.pth was only 73 mb. What is the cause of this discrepancy? Since the only difference between the two checkpoints should be the weights, I would have assumed that they would be the same size.

Coming from YOLOv5 where the small architecture checkpoint size is only a few megabytes, I was hoping for the sizes to be smaller as I'd like to deploy YOLO-NAS on Raspberry Pis with only 512 megabytes of memory. To my understanding, using PTQ and QAT can only reduce model size by up to four times.

Versions

No response

@ShpihanVlad
Copy link

@cvgjnh hi, from what I understand they store additional information which may be used for continuing of learning (last trained epoch for example) and maybe EMA used during training there too, so from this comes large weights size. Averaged model, however, does not use these, so from here comes smaller size. Unfortunately I don't know what is safe to remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants