[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

wangkl2 · 2025-01-20T10:54:00Z

Description

Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi for docker-compose deployment to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b.

Issues

Slow serving issue of neural-chat-7b on ICX: #1420
Request failure of neural-chat-7b due to no chat-template on Gaudi: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a

Tests

Local tests.

Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b. opea-project#1420 Signed-off-by: Wang, Kai Lawrence <[email protected]>

github-actions · 2025-01-20T10:54:13Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

for more information, see https://pre-commit.ci

Signed-off-by: Wang, Kai Lawrence <[email protected]>

chensuyue · 2025-01-20T14:46:16Z

ChatQnA for ROCm should be fixed in another PR.

This PR is to correct the env variable name in chatqna example on ROCm platform passing to the docker container of tgi and tei. For tgi, either `HF_TOKEN` and `HUGGING_FACE_HUB_TOKEN` could be parsed in tgi while `HF_API_TOKEN` can be parsed in tei. Fix the CI issue for ROCm in opea-project#1430 Signed-off-by: Wang, Kai Lawrence <[email protected]>

lianhao · 2025-01-21T01:13:25Z

@eero-t Since the default model has been changed, does that mean the currently tgi resource limits in cpu-values in chatqna helm chart is not suitable any more?

eero-t · 2025-01-21T10:36:43Z

@eero-t Since the default model has been changed, does that mean the currently tgi resource limits in cpu-values in chatqna helm chart is not suitable any more?

Since current values are just dummy ones, for getting better cgroup policy for the pod containers, they still fill that role.

But for those values to help Kubernetes scheduling & HPA scaling, they would indeed need to match the actual resource usage. Meaning that changes in model, its parameters, or engine version would all require checking that the requested amount of resources still matches actual usage, and updating them accordingly.

wangkl2 requested review from lvliang-intel and letonghan as code owners January 20, 2025 10:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

a90bb26

for more information, see https://pre-commit.ci

chensuyue added this to the v1.2 milestone Jan 20, 2025

chensuyue mentioned this pull request Jan 20, 2025

remove Dockerfile.wrapper #1429

Merged

4 tasks

Fix typo

ac05318

Signed-off-by: Wang, Kai Lawrence <[email protected]>

chensuyue approved these changes Jan 20, 2025

View reviewed changes

lvliang-intel approved these changes Jan 20, 2025

View reviewed changes

chensuyue merged commit 3d3ac59 into opea-project:main Jan 20, 2025
24 of 25 checks passed

wangkl2 mentioned this pull request Jan 20, 2025

[ROCm] Fix the hf-token setting for TGI and TEI in ChatQnA #1432

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

wangkl2 commented Jan 20, 2025 •

edited

Loading

github-actions bot commented Jan 20, 2025 •

edited

Loading

chensuyue commented Jan 20, 2025

lianhao commented Jan 21, 2025 •

edited

Loading

eero-t commented Jan 21, 2025

[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu #1430

Conversation

wangkl2 commented Jan 20, 2025 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

github-actions bot commented Jan 20, 2025 • edited Loading

Dependency Review

Scanned Files

chensuyue commented Jan 20, 2025

lianhao commented Jan 21, 2025 • edited Loading

eero-t commented Jan 21, 2025

wangkl2 commented Jan 20, 2025 •

edited

Loading

github-actions bot commented Jan 20, 2025 •

edited

Loading

lianhao commented Jan 21, 2025 •

edited

Loading