How to ensure labels accuracy in DPA-2 distillation #3623
-
In the usage paradigm of DPA2, pre-trained models are more suitable for the finetune+distillation mode, but in the distillation step, how can we ensure that the student model is provided with sufficient accuracy labels by finetuned pre-trained model? The plan I envisioned is that: Overall, the benefit of finetune+distillation may be that it saves time in DFT calculations. But I feel that the time required for a complete finetuned model may be comparable with the time required for DFT calculations, or even more. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Beta Was this translation helpful? Give feedback.
typically the majority of the computational cost is spend on the labeling stage. by using DPA-2 the save in the labeling stage is 1-3 orders of magnitudes. the extra requirement in the training and exploration stage is usually acceptable.
In general i cannot answer how much would you save by using dpa-2 in a CL, that depends on the cost of training, exploration and labeling stages. you may test and report in the discussion.