-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gpu_mig40 to Greatlakes. #811
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #811 +/- ##
==========================================
- Coverage 69.43% 69.41% -0.03%
==========================================
Files 48 48
Lines 4397 4397
Branches 1065 1065
==========================================
- Hits 3053 3052 -1
- Misses 1136 1137 +1
Partials 208 208 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Wondering how we would support submitting to both mig40 and gpu with fallback
The two partitions have differing numbers of GPUs per node which flow uses to calculate the node occupancy. I'm also not certain how SLURM handles multiple GPU jobs with the two partitions - does it assign all to the same partition, or some ranks to one and some to another? Multiple partitions would cause serious problems for MPI jobs and load balancing issues for aggregates. We could potentially enforce that the "gpu_mig40,gpu" partition is valid only for single GPU jobs. Feel free to implement this and test in a future PR. This might be best done after the ongoing work to refactor directives. |
Description
Add the
gpu_mig40
partition to UMich Great Lakes.Motivation and Context
Allow users to submit to this new GPU partition.
Checklist: