Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pack Skypilot jobs and clusters onto GPU nodes with Kubernetes? #4851

Open
ajayjain opened this issue Feb 28, 2025 · 1 comment
Open

Comments

@ajayjain
Copy link

When using Skypilot with a kubernetes cluster, is there a way to pack workloads (possibly across different users) onto nodes? Right now, jobs seem to spread across multiple nodes, fragmenting the cluster.

% sky show-gpus --cloud kubernetes
Kubernetes GPUs (context: ...)
GPU   REQUESTABLE_QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS
<gpu_type>  1, 2, 4, 8                ...          ...

Kubernetes per node accelerator availability
NODE_NAME  GPU_NAME  TOTAL_GPUS  FREE_GPUS
...  <gpu_type>      8           7
...  <gpu_type>      8           6
@romilbhardwaj
Copy link
Collaborator

Hey @ajayjain, we delegate the exact pod placement to the k8s scheduler.

The "right" way to enforce bin packing would to be apply a scheduler config like this on your cluster: https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/#enabling-bin-packing-using-mostallocated-strategy

Another (easier) way to achieving a similar packing effect would be to apply weak pod affinity to your submitted pods so they stick together on a best-effort basis:

# task.yaml
run: |
  echo hi

experimental:
  config_overrides:
    kubernetes:
      pod_config:
        metadata:
          labels:
            jobtype: binpacked
        spec:
          affinity:
            podAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                        - key: jobtype
                          operator: In
                          values:
                            - binpacked
                    topologyKey: kubernetes.io/hostname

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants