Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container with privilege context failed to be schedulered #611

Open
gongysh2004 opened this issue Nov 14, 2024 · 1 comment
Open

container with privilege context failed to be schedulered #611

gongysh2004 opened this issue Nov 14, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@gongysh2004
Copy link

gongysh2004 commented Nov 14, 2024

What happened:
container with privilege context failed to be scheduled
What you expected to happen:
should be scheduled
How to reproduce it (as minimally and precisely as possible):
install the hami according to the install steps, then run the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-test
  labels:
    app: gpu-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gpu-test
  template:
    metadata:
      labels:
        app: gpu-test
    spec:
      containers:
      - name: gpu-test
        securityContext:
          privileged: true
        image: ubuntu:18.04
        resources:
          limits:
            nvidia.com/gpu: 2 # requesting 2 vGPUs
            nvidia.com/gpumem: 10240
        command: ["/bin/sh", "-c"]
        args: ["while true; do cat /mnt/data/test.txt; sleep 5; done"]
        volumeMounts:
        - mountPath: "/mnt/data"
          name: data-volume
      volumes:
      - name: data-volume
        hostPath:
          path: /opt/data
          type: Directory

Anything else we need to know?:

  • The output of nvidia-smi -a on your host
  • Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
  • The hami-device-plugin container logs
  • The hami-scheduler container logs

  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
  • Any relevant kernel output lines from dmesg

Environment:

  • HAMi version:
root@node7vm-1:~/test# helm ls -A | grep hami
hami            kube-system     2               2024-11-14 15:18:36.886955318 +0800 CST deployed        hami-2.4.0                      2.4.0      
my-hami-webui   kube-system     4               2024-11-14 17:18:24.678439025 +0800 CST deployed        hami-webui-1.0.3                1.0.3   
  • nvidia driver or other AI device driver version:
root@node7bm-1:~# nvidia-smi 
Thu Nov 14 15:58:33 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L40S                    On  | 00000000:08:00.0 Off |                  Off |
| N/A   27C    P8              22W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L40S                    On  | 00000000:09:00.0 Off |                  Off |
| N/A   28C    P8              21W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L40S                    On  | 00000000:0E:00.0 Off |                  Off |
| N/A   26C    P8              19W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L40S                    On  | 00000000:11:00.0 Off |                  Off |
| N/A   26C    P8              21W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA L40S                    On  | 00000000:87:00.0 Off |                  Off |
| N/A   26C    P8              21W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA L40S                    On  | 00000000:8D:00.0 Off |                  Off |
| N/A   26C    P8              21W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA L40S                    On  | 00000000:90:00.0 Off |                  Off |
| N/A   26C    P8              21W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA L40S                    On  | 00000000:91:00.0 Off |                  Off |
| N/A   27C    P8              19W / 350W |      0MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
  • Docker version from docker version
  • Docker command, image and tag used
  • Kernel version from uname -a
Linux node7vm-1 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

if I don't request gpu mem:

Events:
  Type     Reason                    Age   From               Message
  ----     ------                    ----  ----               -------
  Normal   Scheduled                 19s   default-scheduler  Successfully assigned default/gpu-test-5f9f7d48d9-4wsrp to node7bm-1
  Warning  UnexpectedAdmissionError  20s   kubelet            Allocate failed due to rpc error: code = Unknown desc = no binding pod found on node node7bm-1, which is unexpected

if I request gpu mem:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  14s   default-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient nvidia.com/gpumem. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 No preemption victims found for incoming pod..
@gongysh2004 gongysh2004 added the kind/bug Something isn't working label Nov 14, 2024
@Nimbus318
Copy link
Contributor

Privileged Pods have direct access to the host's devices—they share the host's device namespace and can directly access everything under the /dev directory. This basically bypasses the container's device isolation

So, in our HAMi webhook:

	if ctr.SecurityContext.Privileged != nil && *ctr.SecurityContext.Privileged {
		klog.Warningf(template+" - Denying admission as container %s is privileged", req.Namespace, req.Name, req.UID, c.Name)
		continue
	}

the code just skips handling privileged Pods altogether, which means they fall back to being scheduled by the default scheduler. You can see from the Events you posted that it's scheduled by the default-scheduler

So, the reason scheduling fails when resources.limits includes nvidia.com/gpumem is that the default-scheduler doesn’t recognize nvidia.com/gpumem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants