-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues running finetuned versions of supported models #67
Comments
Hi @Edwinhr716, we're investigating this, we will get back to you! |
any update for this? why any finetuned versions will not be supported? |
The error is:
So it looks like you are missing a Huggingface token. Did you run |
I don't think it is related to that log. I've had the same log show up in successful deployments of optimum tpu. Since I'm building the image using docker, would running that in my terminal work anyway? I already pass the token as en environment variable. I also want to add that I did not change anything on the dockerfile, all I did to build the image is run |
It needs to be run in the same container as tgi-tpu. Try adding that as an init-container. |
@Edwinhr716 could you share the full logs? I think you'd mentioned over chat that you saw some warning like this:
|
Those are the full logs. That was the initial issue that I was facing, but I circumvented it by duplicating the Trendyol repo and adding the Llama2 tokenizer here: https://huggingface.co/Edwinhr716/Trendyol-LLM-7b-chat-v0.1-duplicate/tree/main. After doing that, I get the issue mentioned here. |
Based on the comment in the slack channel (https://huggingface.slack.com/archives/C06GAFTA5AN/p1721259640899439), looks like this may be due to a known issue with TGI container serving fine-tuned models? |
Don't think so, I just tested deploying ICTNLP/Llama-2-7b-chat-TruthX using TGI 2.0.2 release and was able to do it successfully. These are the log messages for the TGI one
And this is the yaml that I used for the GPU one
Taken from https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi. So It seems like it is an optimum TPU side issue |
Looked into this with @Edwinhr716 . Optimum tpu is failing to load these models for various reasons:
Trendyol/Trendyol-LLM-7b-base-v0.1
For some reason the stdout or stderr logs are not getting redirected back to the console. |
Hi @richardsliu, for RLHFlow/ArmoRM-Llama3-8B-v0.1 and Trendyol/Trendyol-LLM-7b-base-v0.1 I think the issue is that our sharding technique for now requires the last dimension to be divisible by the number of TPUs. I think we might be able to find a workaround, otherwise for now a solution could be to increase the number of weights to align the last one to a multiple of the accelerators. optimum-tpu/optimum/tpu/model.py Line 52 in da2d1ad
Hope it helps. |
I've been testing running various finetuned versions of supported models on GKE. However, it gets stuck on
Using the Hugging Face API to retrieve tokenizer config
This are the full logs
I get this issue in the following models: RLHFlow/ArmoRM-Llama3-8B-v0.1, Trendyol/Trendyol-LLM-7b-base-v0.1, ICTNLP/Llama-2-7b-chat-TruthX.
However, I've also been able to successfully run the following models: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3, georgesung/llama2_7b_chat_uncensored
I was wondering if there were any requirements in terms of files that were required to run a finetuned model? Or any help debugging the issue
This is the yaml that I used to deploy
The text was updated successfully, but these errors were encountered: