[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

Nech-C · 2025-01-02T20:05:24Z

What does this PR do?

This PR continues to solve issues raised in #3041 and discussed in #3066. When multiple GPUs are present, reserving memory for max_layer_size can cause unnecessary offloading to the CPU or disks. The PR implements the approach proposed by @SunMarc. It works by first assuming no offloading is necessary, and if there are offloaded modules in the device map, it's recomputed, assuming offloading will occur.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…tion - This feature can be enabled by setting `reserve_max_layer` to `False`. By default, the parameter is set to `True`, preserving the original behavior. - When multiple GPUs are present, all modules can be allocated across them. However, reserving space for the largest layer size may cause unnecessary offloading.

…c-optim

Nech-C added 2 commits January 2, 2025 12:57

Merge branch 'main' into feature/infer-auto-device-map-multi-gpu-allo…

f4de3c0

…c-optim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

Nech-C commented Jan 2, 2025

[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

Are you sure you want to change the base?

[WIP] optimize infer_auto_device_map for multi-GPU allocation #3321

Conversation

Nech-C commented Jan 2, 2025

What does this PR do?

Before submitting

Who can review?