-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloomz 176B inference doesn't work #15
Comments
out of curiosity and adding a question over your question, how much of your 420GB of RAM did you use to convert to ggml ? I barely managed to convert bloomz-7b1 using 32GB of RAM so I wonder how much 176b needs. |
all of it + approx 30GB of virtual memory. |
It seems you are running out of memory. Most probably,I can help to reduce the memory usage to 1/6th(was successful with 7b1 model). What is the model size(disk usage) of the 176B model? |
The disk size for the model is approx 360GB. I don't think it is a problem with out-of-memory as there is 420GB of main memory + 50 swap memory. |
same question, while I have 1000GB RAM |
./main -m models/bloom/ggml-model-bloom-f16-q4_0.bin -t 96 -p "The most beautiful question is" -n 20 main: prompt: 'The most beautiful question is' sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 The most beautiful question is the one you ask yourself. main: mem per token = 192093564 bytes |
Above was produced with this commit |
I have a cluster running scientific linux with basically unlimited ram but 4x15gb vram I can test things on. If anybody gets a GGML that is worth testing, tell me. |
Getting lost in this thread, just converted the 176B model into GGML, fp16, and now looking at using bloom.cpp, but noticed that @barsuma Readme appears to reflect there there are still problems. Could we get a status update? Doesn't look like his code is a pull request, or that this code has been updated to solve the issue, but I am not sure. |
Hello,
I have converted bloomz model successfully, but the inference doesn't work.
I have enough cpu memory "420GB". Any idea what is the issue ?
The text was updated successfully, but these errors were encountered: