Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: Improved Quota mechanism for DRA resources #4840

Open
4 tasks
klueska opened this issue Sep 11, 2024 · 1 comment
Open
4 tasks

DRA: Improved Quota mechanism for DRA resources #4840

klueska opened this issue Sep 11, 2024 · 1 comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Comments

@klueska
Copy link
Contributor

klueska commented Sep 11, 2024

Enhancement Description

At present, quota for DRA resources is done similar to other resources, i.e. at admission time rather allocation time.

We propose to change the quota mechanism used for DRA resources to be done at allocation time instead.

Pros:

  • Can limit resource consumption based on what actually gets made available to a user, compared to basing it on what is requested (might be a lower limit).
  • Supports creating more claims and pods than can run at the moment ("batching" - might not be relevant).
  • Can support "one of" (if X exceeds quota, use Y).

Cons:

  • All schedulers need to also consider the ResourceQuota when checking devices.
  • Exceeding quota has to be reported as part of scheduling failures. OTOH, users typically also don't create ResourceClaims manually, so there is some indirection with admission checks, too.

We believe the pros outweigh the cons as it enables use cases such as putting quota on total amount of GPU memory allocated rather then strictly on number of devices allocated.

  • One-line enhancement description (can be used as a release note):
    Enforce quota for DRA resources at allocation time instead of admission time

  • Kubernetes Enhancement Proposal:
    TBD

  • Discussion Link:
    dra-evolution: quota mechanism kubernetes-sigs/wg-device-management#24

  • Primary contact (assignee):
    @klueska, @pohly, @johnbelamaric, @thockin

  • Responsible SIGs:
    /sig node
    /sig scheduling

  • Enhancement target (which target equals to which milestone):

    • Alpha release target: 1.32
    • Beta release target: 1.33
    • Stable release target: 1.34
  • Alpha

    • KEP (k/enhancements) update PR(s):
      • TBD
    • Code (k/k) update PR(s):
      • TBD
    • Docs (k/website) update PR(s):
      • TBD
@klueska klueska converted this from a draft issue Sep 11, 2024
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Sep 11, 2024
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Scheduling Sep 11, 2024
@haircommander haircommander moved this from Proposed for consideration to Triage in SIG Node 1.32 KEPs planning Sep 17, 2024
@pohly
Copy link
Contributor

pohly commented Nov 19, 2024

/wg device-management

@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: Triage
Status: 📋 Backlog
Status: Needs Triage
Development

No branches or pull requests

3 participants