Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu_mig40 to Greatlakes. #811

Merged
merged 6 commits into from
Feb 6, 2024
Merged

Add gpu_mig40 to Greatlakes. #811

merged 6 commits into from
Feb 6, 2024

Conversation

joaander
Copy link
Member

@joaander joaander commented Feb 5, 2024

Description

Add the gpu_mig40 partition to UMich Great Lakes.

Motivation and Context

Allow users to submit to this new GPU partition.

Checklist:

@joaander joaander requested review from a team as code owners February 5, 2024 18:53
@joaander joaander requested review from tcmoore3 and Charlottez112 and removed request for a team February 5, 2024 18:53
Copy link

codecov bot commented Feb 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1d0f9fd) 69.43% compared to head (7221806) 69.41%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #811      +/-   ##
==========================================
- Coverage   69.43%   69.41%   -0.03%     
==========================================
  Files          48       48              
  Lines        4397     4397              
  Branches     1065     1065              
==========================================
- Hits         3053     3052       -1     
- Misses       1136     1137       +1     
  Partials      208      208              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@cbkerr cbkerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Wondering how we would support submitting to both mig40 and gpu with fallback

@joaander
Copy link
Member Author

joaander commented Feb 6, 2024

Thanks! Wondering how we would support submitting to both mig40 and gpu with fallback

The two partitions have differing numbers of GPUs per node which flow uses to calculate the node occupancy. I'm also not certain how SLURM handles multiple GPU jobs with the two partitions - does it assign all to the same partition, or some ranks to one and some to another? Multiple partitions would cause serious problems for MPI jobs and load balancing issues for aggregates.

We could potentially enforce that the "gpu_mig40,gpu" partition is valid only for single GPU jobs. Feel free to implement this and test in a future PR. This might be best done after the ongoing work to refactor directives.

@cbkerr cbkerr merged commit 6c1bd8b into main Feb 6, 2024
10 of 11 checks passed
@cbkerr cbkerr deleted the greatlakes-mig40 branch February 6, 2024 20:22
@cbkerr cbkerr added this to the 0.28.0 milestone Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants