Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: client-side throttling, not priority and fairness #745

Open
chaunceyjiang opened this issue Dec 27, 2024 · 1 comment · May be fixed by #769
Open

scheduler: client-side throttling, not priority and fairness #745

chaunceyjiang opened this issue Dec 27, 2024 · 1 comment · May be fixed by #769
Labels
kind/bug Something isn't working

Comments

@chaunceyjiang
Copy link
Contributor

What happened:

│ I1227 02:28:17.812399       1 event.go:307] "Event occurred" object="u-dad49dd32cdf/test-222-76c478b9c5-sq794" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"                                                                  │
│ I1227 02:28:17.977042       1 request.go:629] Waited for 197.657513ms due to client-side throttling, not priority and fairness, request: PATCH:https://10.233.0.1:443/api/v1/namespaces/u-8c6ddc72f5a0/events/qwen-0-5-7fc4c6c8f9-h5tvm.1814e77ce9f9cd1d                                                                 │
│ I1227 02:28:18.176980       1 request.go:629] Waited for 196.317212ms due to client-side throttling, not priority and fairness, request: PATCH:https://10.233.0.1:443/api/v1/namespaces/u-8c6ddc72f5a0/events/qwen-0-5-7fc4c6c8f9-rwf2g.1814e77ceb07cb5c                                                                 │
│ I1227 02:28:18.376885       1 request.go:629] Waited for 196.355669ms due to client-side throttling, not priority and fairness, request: PATCH:https://10.233.0.1:443/api/v1/namespaces/u-8c6ddc72f5a0/events/qwen-0-5-7fc4c6c8f9-nr4lq.1814e77ceb90fa99                                                                 │
│ I1227 02:28:18.576398       1 request.go:629] Waited for 195.351409ms due to client-side throttling, not priority and fairness, request: PATCH:https://10.233.0.1:443/api/v1/namespaces/u-8c6ddc72f5a0/events/qwen-0-5-7fc4c6c8f9-csw5w.1814e77cec592297                                                                 │
│ I1227 02:28:18.776450       1 request.go:629] Waited for 196.339784ms due to client-side throttling, not priority and fairness, request: PATCH:https://10.233.0.1:443/api/v1/namespaces/u-8c6ddc72f5a0/events/qwen-0-5-7fc4c6c8f9-9ttks.1814e77ceccba411

What you expected to happen:

When a large number of pods are discovered for scheduling, client-side throttling is triggered.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

  • The output of nvidia-smi -a on your host
  • Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
  • The hami-device-plugin container logs
  • The hami-scheduler container logs
  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
  • Any relevant kernel output lines from dmesg

Environment:

  • HAMi version:
  • nvidia driver or other AI device driver version:
  • Docker version from docker version
  • Docker command, image and tag used
  • Kernel version from uname -a
  • Others:
@chaunceyjiang chaunceyjiang added the kind/bug Something isn't working label Dec 27, 2024
@chaunceyjiang
Copy link
Contributor Author

/assgin

@chaunceyjiang chaunceyjiang linked a pull request Jan 2, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant