-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage] When quantizing Qwen2.5-7B-Instruct, the loss is very high and generate many ! #1042
Comments
Here is log that I use this wikitest demo to quantiz Qwen1.5-7B-Chat into GPTQ-INT4. The loss starts to be large from middle layers and gets very high in the final layers
Quantized model generates kinda okay respones but will keeps generateing \n in the end of its answer like this:
|
|
@Qubitium
Sence I will test the quantized model on this dataset eventually, just for a higher eval score, would it be better to use this CMMLU dataset to create calibration data, or would it be better to use a more general dataset (like the previously mentioned c4) ? Secondly, If use CMMLU dataset, sence it is made up of multiple-choice questions, so each question and especially its options are quite short. If I simply combine all the text into one to create calibration data (using the join function), would it result in poor quantization performance for the model? Thirdly, if I should use a more general dataset(says c4), sence I plan to evaluate the quantized model's performance on a CMMLU, a Chinese text dataset , should I use only Chinese text from the c4, or both Chinese and English text? |
I would recommend both english and chinese mixed dataset. It is good to verify, like you are doing, to make sure the data values are valid and without error. |
Describe the bug
I'm quantizing Qwen2.5-7B-Instruct to GPTQ-INT4 format, using the wikitext demo, and it shows a very high loss. It is kind of the same situation when I use AutoGPTQ to quantize this model using the wikitext. I dont know if it is the reason of Qwen2.5 or the clib_data.
Here is the log:
GPU Info
Software Info
I'm using the latest GPTQModel==1.5.4.dev0 which is installed from source, torch==2.4.1, transformers==4.47.
I'm also using Cuda-12.0 (I dont know if the cuda12.0 instead of other popular ones like 12.1 or 12.4 casued this problem).
Additional context
My code is
When I use the quantized model, it generated many !
Here is the code:
Here is the result:
The text was updated successfully, but these errors were encountered: