Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cryo: Allow GPU nodes to spawn across AZs #3381

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented Nov 6, 2023

We already do this for GCP, now we do it on AWS too.

Fixes #3334

EDIT: confirmed deployments

  • 2i2c-aws-us
  • carbonplan
  • gridsst
  • jupter-meets-the-earth
  • nasa-cryo
  • nasa-ghg
  • nasa-veda
  • openscapes
  • smithsonian
  • ubc-eoas
  • victor

We already do this for GCP, now we do it on AWS too.

Fixes 2i2c-org#3334
@yuvipanda yuvipanda requested a review from a team as a code owner November 6, 2023 10:01
@yuvipanda
Copy link
Member Author

Deployed and tested on cryo

@yuvipanda yuvipanda merged commit ef1a18e into 2i2c-org:master Nov 6, 2023
2 checks passed
@yuvipanda
Copy link
Member Author

I'm going to terraform apply this across our AWS clusters

@consideRatio
Copy link
Contributor

I think there may be an issue if we specify a region where the GPU isn't available, but I'm not sure - maybe not. Could also have been resolved with updates to cluster-autoscaler understanding it shouldn't attempt to scale up in a zone where the relevant GPU to attach isn't available.

@yuvipanda
Copy link
Member Author

The terraform apply only creates EFS Mount Target in all zones, so nodes can spawn there if needed. Only cryocloud is updated now.

Was there a previous issue with cluster-autoscaler around this?

@damianavila
Copy link
Contributor

I'm going to terraform apply this across our AWS clusters

@yuvipanda, can you confirm if this ⬆️ happened?

@consideRatio
Copy link
Contributor

I did a sweep, this is now consistently applied across all AWS projects!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

Mitigating AWS running out on GPU nodes in an individual zone
4 participants