-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430
Conversation
Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b. opea-project#1420 Signed-off-by: Wang, Kai Lawrence <[email protected]>
Dependency Review✅ No vulnerabilities or license issues found.Scanned Files |
for more information, see https://pre-commit.ci
Signed-off-by: Wang, Kai Lawrence <[email protected]>
ChatQnA for ROCm should be fixed in another PR. |
This PR is to correct the env variable name in chatqna example on ROCm platform passing to the docker container of tgi and tei. For tgi, either `HF_TOKEN` and `HUGGING_FACE_HUB_TOKEN` could be parsed in tgi while `HF_API_TOKEN` can be parsed in tei. Fix the CI issue for ROCm in opea-project#1430 Signed-off-by: Wang, Kai Lawrence <[email protected]>
@eero-t Since the default model has been changed, does that mean the currently tgi resource limits in |
Since current values are just dummy ones, for getting better cgroup policy for the pod containers, they still fill that role. But for those values to help Kubernetes scheduling & HPA scaling, they would indeed need to match the actual resource usage. Meaning that changes in model, its parameters, or engine version would all require checking that the requested amount of resources still matches actual usage, and updating them accordingly. |
Description
Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi for docker-compose deployment to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b.
Issues
Slow serving issue of neural-chat-7b on ICX: #1420
Request failure of neural-chat-7b due to no chat-template on Gaudi:
As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one
.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
n/a
Tests
Local tests.