Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

cvgjnh · 2024-07-24T07:48:55Z

💡 Your Question

I used the Roboflow notebook to train a model from scratch for 25 epochs using the yolo_nas_s architecture. While this number of epochs did not produce great results, I looked at the checkpoints and found that the size of ckpt_best.pth was 244.3 mb while average_model.pth was only 73 mb. What is the cause of this discrepancy? Since the only difference between the two checkpoints should be the weights, I would have assumed that they would be the same size.

Coming from YOLOv5 where the small architecture checkpoint size is only a few megabytes, I was hoping for the sizes to be smaller as I'd like to deploy YOLO-NAS on Raspberry Pis with only 512 megabytes of memory. To my understanding, using PTQ and QAT can only reduce model size by up to four times.

Versions

No response

ShpihanVlad · 2024-08-02T10:54:58Z

@cvgjnh hi, from what I understand they store additional information which may be used for continuing of learning (last trained epoch for example) and maybe EMA used during training there too, so from this comes large weights size. Averaged model, however, does not use these, so from here comes smaller size. Unfortunately I don't know what is safe to remove

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

cvgjnh commented Jul 24, 2024

ShpihanVlad commented Aug 2, 2024

Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

Why is there a large size difference between ckpt_best.pth and average_model.pth? #2038

Comments

cvgjnh commented Jul 24, 2024

💡 Your Question

Versions

ShpihanVlad commented Aug 2, 2024