You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
container with privilege context failed to be scheduled What you expected to happen:
should be scheduled How to reproduce it (as minimally and precisely as possible):
install the hami according to the install steps, then run the following deployment:
Linux node7vm-1 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Others:
if I don't request gpu mem:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned default/gpu-test-5f9f7d48d9-4wsrp to node7bm-1
Warning UnexpectedAdmissionError 20s kubelet Allocate failed due to rpc error: code = Unknown desc = no binding pod found on node node7bm-1, which is unexpected
if I request gpu mem:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14s default-scheduler 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient nvidia.com/gpumem. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod..
The text was updated successfully, but these errors were encountered:
Privileged Pods have direct access to the host's devices—they share the host's device namespace and can directly access everything under the /dev directory. This basically bypasses the container's device isolation
ifctr.SecurityContext.Privileged!=nil&&*ctr.SecurityContext.Privileged {
klog.Warningf(template+" - Denying admission as container %s is privileged", req.Namespace, req.Name, req.UID, c.Name)
continue
}
the code just skips handling privileged Pods altogether, which means they fall back to being scheduled by the default scheduler. You can see from the Events you posted that it's scheduled by the default-scheduler
So, the reason scheduling fails when resources.limits includes nvidia.com/gpumem is that the default-scheduler doesn’t recognize nvidia.com/gpumem
What happened:
container with privilege context failed to be scheduled
What you expected to happen:
should be scheduled
How to reproduce it (as minimally and precisely as possible):
install the hami according to the install steps, then run the following deployment:
Anything else we need to know?:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
uname -a
if I don't request gpu mem:
if I request gpu mem:
The text was updated successfully, but these errors were encountered: