Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

wangkl2
Copy link
Collaborator

@wangkl2 wangkl2 commented Jan 20, 2025

Description

Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi for docker-compose deployment to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b.

Issues

Slow serving issue of neural-chat-7b on ICX: #1420
Request failure of neural-chat-7b due to no chat-template on Gaudi: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a

Tests

Local tests.

Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b.

opea-project#1420
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Copy link

github-actions bot commented Jan 20, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

@chensuyue chensuyue added this to the v1.2 milestone Jan 20, 2025
@chensuyue chensuyue mentioned this pull request Jan 20, 2025
4 tasks
Signed-off-by: Wang, Kai Lawrence <[email protected]>
@chensuyue
Copy link
Collaborator

ChatQnA for ROCm should be fixed in another PR.

@chensuyue chensuyue merged commit 3d3ac59 into opea-project:main Jan 20, 2025
24 of 25 checks passed
wangkl2 added a commit to wangkl2/GenAIExamples that referenced this pull request Jan 20, 2025
This PR is to correct the env variable name in chatqna example on ROCm platform passing to the docker container of tgi and tei. For tgi, either `HF_TOKEN` and `HUGGING_FACE_HUB_TOKEN` could be parsed in tgi while `HF_API_TOKEN` can be parsed in tei.

Fix the CI issue for ROCm in opea-project#1430

Signed-off-by: Wang, Kai Lawrence <[email protected]>
@lianhao
Copy link
Collaborator

lianhao commented Jan 21, 2025

@eero-t Since the default model has been changed, does that mean the currently tgi resource limits in cpu-values in chatqna helm chart is not suitable any more?

@eero-t
Copy link
Contributor

eero-t commented Jan 21, 2025

@eero-t Since the default model has been changed, does that mean the currently tgi resource limits in cpu-values in chatqna helm chart is not suitable any more?

Since current values are just dummy ones, for getting better cgroup policy for the pod containers, they still fill that role.

But for those values to help Kubernetes scheduling & HPA scaling, they would indeed need to match the actual resource usage. Meaning that changes in model, its parameters, or engine version would all require checking that the requested amount of resources still matches actual usage, and updating them accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants