Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Nech-C
Copy link
Contributor

@Nech-C Nech-C commented Jan 2, 2025

What does this PR do?

This PR continues to solve issues raised in #3041 and discussed in #3066. When multiple GPUs are present, reserving memory for max_layer_size can cause unnecessary offloading to the CPU or disks. The PR implements the approach proposed by @SunMarc. It works by first assuming no offloading is necessary, and if there are offloaded modules in the device map, it's recomputed, assuming offloading will occur.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Nech-C added 2 commits January 2, 2025 12:57
…tion

- This feature can be enabled by setting `reserve_max_layer` to `False`. By default, the parameter is set to `True`, preserving the original behavior.
- When multiple GPUs are present, all modules can be allocated across them. However, reserving space for the largest layer size may cause unnecessary offloading.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant