-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization doesn't work with Bloomz 176B #14
Comments
same question |
Unfortunately I need more details than that :/ |
Yes, it works for BLOOMZ-560m and BLOOMZ-7B1. I got the same problem shown in the @agemagician error message. |
Oh, seeing #15, it seems you have already solved this issue? What was the problem? @agemagician |
I cannot inference the 176B FP16 model, while I have 1TB RAM. And the same error message is shown in @agemagician #15 . It works for 560M and 7B1. |
#15 using fp16 not 4-bit model. |
Here is where it seems to crash... $ g++ -I. -I./examples -g -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize Breakpoint 1, bloom_model_quantize (fname_inp="./models/bloom/ggml-model-bloom-f16.bin", fname_out="./models/bloom/ggml-model-bloomz-f16-q4_0.bin", itype=2) at quantize.cpp:190 Program received signal SIGABRT, Aborted. |
Quantization for 176B works with this commit |
Can we get a clearer status update? Your readme isn't clear whether everything is good with the 176B quantize, I am still having a problem with it on bloom.cpp, and not sure where your status is on this? Any word on whether patches/fixes will get into bloom.cpp? |
Hi @barsuna Thank you very much for making your fork to fix quantising with 176B. I recently quantised BloomZ 176B and Bloom Chat 176B to GPTQ and released to HF Hub, and today wanted to do GGML as well. I hit the issue described in this thread and your fork enabled me to quantise the models. Unfortunately there appears to be an inference problem. I was wondering if you saw this too, and might have any idea what is wrong? The issue is that it seems to be missing words out, or skipping over words. Here's some examples testing q4_0 with BloomChat 176B (issue is the same with BloomZ 176B):
The story prompts seem coherent, but then it's like it suddenly skips forward in the sentence by a few words. Then the Paris prompt is half coherent, half not, and again looks like bits are missing. Is there any chance you might know what is wrong, or could look to fix it? If so I will be able to release 176B GGMLs to HF Hub and there's quite a few people who would love to try them. Thanks in advance. |
Hello,
I have successfully converted the bloomz 176B model to fp16.
However, the quantization doesn't work and throw an error:
Any idea how this could be fixed ?
The text was updated successfully, but these errors were encountered: