Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to latest vllm changes #632

Closed
wants to merge 1 commit into from
Closed

Conversation

lianhao
Copy link
Collaborator

@lianhao lianhao commented Dec 10, 2024

Description

  • Remove --eager-enforce on hpu to improve performance
  • Refactor to the upstream docker entrypoint changes

Issues

Fixes #631.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

@lianhao lianhao requested a review from yongfengdu as a code owner December 10, 2024 07:08
@lianhao lianhao force-pushed the bug631 branch 2 times, most recently from 4aad447 to 6bfddb6 Compare December 10, 2024 07:20
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, matches changes for "GenAIExamples" in opea-project/GenAIExamples#1210 (and corresponding PR for "GenAIComps" repo).

@poussa ?

@eero-t
Copy link
Contributor

eero-t commented Dec 11, 2024

Investigating the CI failure for "agent, gaudi, ci-gaudi-values, common" test, I see 2 bugs:

  • agent values.yaml specifies (huge) meta-llama/Meta-Llama-3.1-70B-Instruct model for CPU version of TGI
  • CI test does not provide HF token to access that for the sharded Gaudi TGI pods:
    • [pod/agent20241211092531-tgi-6f6b65dc97-cjhjh/model-downloader] Access to model meta-llama/Llama-3.1-70B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in.

(Besides the size, I think another model would be nicer as default due to license used on Meta's models.)

@eero-t
Copy link
Contributor

eero-t commented Dec 11, 2024

My vLLM PR includes same agent and (relevant) vLLM component changes as yours, but strangely that same CI agent test succeeded for it: https://github.com/opea-project/GenAIInfra/actions/runs/12262626198/job/34212355870?pr=610 ?

EDIT: today's push on my PR got the same issue.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--tensor-parallel-size option can be dropped, as 1 value is the default:
https://docs.vllm.ai/en/latest/usage/engine_args.html

@lianhao
Copy link
Collaborator Author

lianhao commented Dec 12, 2024

This specific failure is caused by 2 different bugs: #639, #641

@lianhao
Copy link
Collaborator Author

lianhao commented Dec 16, 2024

we need to wait for PR #642 to land-in first

- Remove --eager-enforce on hpu to improve performance
- Refactor to the upstream docker entrypoint changes

Fixes issue opea-project#631.

Signed-off-by: Lianhao Lu <[email protected]>
@eero-t
Copy link
Contributor

eero-t commented Dec 17, 2024

This is included to #610, so reviewing & merging that instead is another option.

@lianhao
Copy link
Collaborator Author

lianhao commented Dec 18, 2024

Let's use #610 instead.

@lianhao lianhao closed this Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ci-auto] remove "--enforce-eager" for better vLLM perf
2 participants