Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu assignment scaling #46

Merged
merged 9 commits into from
Mar 11, 2024
Merged

gpu assignment scaling #46

merged 9 commits into from
Mar 11, 2024

Conversation

Waino
Copy link
Collaborator

@Waino Waino commented Jan 22, 2024

  • Instead of considering all swaps of all gpu slots, make some educated guesses interleaved with random subsetting

  • Parameter --time_budget_s allows setting a fixed time budget for the GPU assignment

  • Fix a bug in the extra_empty_penalty: apply extra_empty_penalty for each empty GPU

    If the heuristic only penalizes for the existence of empty GPUs, then
    reducing the number of empty GPUs by one will not be visible in the loss
    unless it is the last one. The penalty should be applied for each empty
    GPU individually.

  • Some misc improvements to config-config, e.g. use tqdm for progress indicator instead of spewing garbage all over stdout.

@Waino Waino requested a review from TimotheeMickus January 22, 2024 08:29
@TimotheeMickus
Copy link
Collaborator

TimotheeMickus commented Jan 22, 2024

How about a unit test for the benchmarking? just to make sure the process doesn't take inordinate amounts of time for small to mid-size specs, and doesn't result in final penalty that are too crazy.

@TimotheeMickus
Copy link
Collaborator

Tagging @shaoxiongji for documentation purposes; this PR would add two flags to the config-config. The behavior, if I unerstand correctly, is as follows:

  • --time_budget_s controls the maximum allowed runtime before forcing the gpu assignment to stop
  • --log_name dumps intermediate info into a file, rather than to stdout as we had previously

Copy link
Collaborator

@TimotheeMickus TimotheeMickus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please fix the linting before merging

Waino added 9 commits March 4, 2024 11:53
If the heuristic only penalizes for the existence of empty GPUs, then
reducing the number of empty GPUs by one will not be visible in the loss
unless it is the last one. The penalty should be applied for each empty
GPU individually.
Start by assigning a ready task to the first slot of each GPU
In this optmization iteration, one task per GPU is selected as a
candidate. The selected slot is the one that is locally optimal to get
rid of: i.e. not considering the destination of the swap.
It flickers past too fast anyhow.
@Waino Waino force-pushed the feat/gpu_assignment_scaling branch from fbb3cd5 to 735b096 Compare March 4, 2024 09:55
@TimotheeMickus
Copy link
Collaborator

Still LGTM, can merge

@Waino Waino merged commit 1ba7af9 into main Mar 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants