Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lmeval): Enable Kueue Job Manager in LMEval #363

Conversation

ruivieira
Copy link
Member

Enable LMEval's Job Manager for Kueue support

@ruivieira ruivieira added kind/enhancement New feature or request lm-eval Issues related to LM-Eval labels Nov 14, 2024
@ruivieira ruivieira self-assigned this Nov 14, 2024
@ruivieira ruivieira added this to the LM-Eval milestone Nov 14, 2024
@ruivieira ruivieira linked an issue Nov 14, 2024 that may be closed by this pull request
Copy link
Collaborator

@yhwang yhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

Copy link

openshift-ci bot commented Nov 14, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yhwang

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

github-actions bot commented Nov 14, 2024

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-operator-ci:1c3989ffb29e6810b7031e10e192274497dd8bf5

📦 LMES driver image: quay.io/trustyai/ta-lmes-driver:1c3989ffb29e6810b7031e10e192274497dd8bf5

📦 LMES job image: quay.io/trustyai/ta-lmes-job:1c3989ffb29e6810b7031e10e192274497dd8bf5

🗂️ CI manifests

devFlags:
  manifests:
    - contextDir: config
      sourcePath: ''
      uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/operator-1c3989ffb29e6810b7031e10e192274497dd8bf5

@yhwang
Copy link
Collaborator

yhwang commented Nov 14, 2024

The only concern is that this will implicitly make trustyai-service-operator depend on Kueue. I guess this may need more discussion.

When enabling the Kueue for LMES, one extra update in the kueue-manager-config configmap is needed. LMEvalJob needs to be added into the externalFrameworks like this:

integrations:
  .......
  externalFrameworks:
  - "trustyai.opendatahub.io/lmevaljob"

Copy link

openshift-ci bot commented Nov 14, 2024

@ruivieira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 1c3989f link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ruivieira
Copy link
Member Author

Thanks @yhwang, this is an important point.

I think it's best to close this PR and, for the moment, add the job manager support as an optional overlay on a separate PR.

@ruivieira ruivieira closed this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request lgtm lm-eval Issues related to LM-Eval ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LMEvalJobs with Kueue suspend enabled start immediately
2 participants