-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7g.79gb does not work as expected. #51
Comments
just wanted to confirm that if we update to the code to use 7g.80gb as shown in diff below, and rebuild the images, it seems to work when specifying nvidia.com/mig-7g.80gb in the resources field(and updating allowedGeometries)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
using gpu-operator (helm 23.9.1), and nos (helm 0.1.2)
I have an issue with nvidia.com/mig-7g.79gb. when specifying it it causes nos to create the mig configuration as expected, but it seems to be specified as nvidia.com/mig-7g.80gb as shown in log below from nvidia-device-plugin.
Additionally, the labels created on the node look like this
But the issue is because we specified nvidia.com/mig-7g.79gb the pod stays in pending. Note the config below (all other nvidia examples commented out below work except 7g.79gb.
I tried adding 7g.80gb to allowedGeometries, but it did not work as expected. Briefly looked at code and see https://github.com/nebuly-ai/nos/blob/main/pkg/gpu/mig/known_configs.go#L93, so not sure if I missed something, or if there is a way to get the desired behavior?
The text was updated successfully, but these errors were encountered: